Hey Internets! How would you test for copy-paste bugs

Prompted by this Twitter thread https://twitter.com/Foone/status/1229641258370355200

Basically someone posted a single line of code on Stack Overflow a few years ago. A developer on the Docker project, and a developer at Razor (the gaming device people) used that same line of code. Code which just happens to assume it returns a GUID, but the guid they both got was the same one. Read the thread to find out why, and why it caused them both pain. And then come back and read on, trust me, it’s not a deep technical reason. But the resulting bug is fatal to an end user.

So, as a tester, how can we detect this specific kind of fault? Unit Testing is unlikely to scale well as a solution, but keen to hear if anyone can come up with a good unit test pattern that might catch it? Could we write an integration test or even a component test for this specific problem (not the copy-paste problem, but the GUID problem or at minimum the “duplicate app launch” problem it’s trying to solve.) Is, it realistically possible to statically detect the copy-paste problem in a codebase (aka Black Duck) style?

Whenever I come back to a problem after a long break, I hear fresh ideas, since this is not the first time somebody copy-pasted StackOverflow code and injected a nasty bug. Keen to see if there are fresh heuristics to follow as a tester.

3 Likes

Won’t be the last time either.

I would think that good code reviews would, if not stop, at least slow down the amount of this sort of issue. I know that if I come across some code during a review that I don’t understand (like the assembly-GUID reference), I would be asking questions. Unfortunately, good code reviews don’t always happen. And even if they do, it would be easy to miss problems which you haven’t encountered before.

I also don’t know if unit testing can uncover (specifically) this sort of problem, but I’m not that good at unit testing. The only thing I can think of is testing against other versions of the same product (do they even have different GUID?) I do know that I’ve done that in the past as a work-around to the “can’t use two of the same program” features.

One code-review heuristic that I have used is to examine the way the code is written, and more importantly, documented. If there is a snippet which is very clearly written in a different style (variable names don’t match the rest of the program is a huge one, and sometimes coders don’t remove the documentation from the source as well), then that particular bit of code needs to be closely examined.

1 Like

I think you have nailed it Brian, because running 2 “different” coppies of the app side by side would have caused the fatal error. If you are on Windows the OS will use SxS to let you run your application if it’s 2 copies of different versions. But it would require some manual installation and require the application to be runnable in a “uninstalled” mode. This would require a lot of developer work to support this kind of ability. But sounds like one sure fire way of catching this bug if the app is not hugely entangled into the O/S. You would have to install a “golden” release version, and then run the developer build somehow in situ. Or the other way around. Definitely worth doing it that way if the application is simple and allows running off of a USB stick. The way the correct code works would have allowed my check.

I am not convinced the code review would catch this bug enough of the time. That line has a tiny bit of code smell to it, but only to the experienced coder. Code review only catches a proportion and some domains of defect. But code review still “feels” more likely than a unit test to catch this kind of bug. I would encourage all testers to read code reviews even if the code is not in a language that they are proficient at. The questions I ask when I am reading objective C code always sound daft, they are often very stupid questions. But I am forcing the developer to look at the code, and to teach me how to read it’s runes. So that’s valuable.

I wonder if a simplistic tools that scrapes stackoverflow accepted answers and then scans your codebase for matches is a thing that would sell? A bit like a lint rule, but crowd-sourcing to support code reviews. I’m only asking on the off chance that a scaleable answer is there, but was just out of reach without the help of a tool like AI/ML, for example.

Personally, I would be VERY sceptical about such a tool. Prior to my current test / programming work, I worked with QA testing for hardware systems. One of our most prized possessions, which was fabulous on paper, was an automated visual inspection device. It would look at a printed circuit board once all of the components were in place and advise the user about potential problems. Like I said, looks great on paper, and NOT having it would have been catastrophic for the company as it is a standard thing for PCB manufacturers. The problem was in the user interface. The users would get a list of hundreds of “is there an issue here?” warnings for most connections, even if the connection was nearly perfect (Say, for example, a chip is a few micrometers off-center). This led to the QA-tester scrolling past and accepting most of the errors, and frequently missing actual problems, if they weren’t in the “we see this a lot” category. (On that, other tests caught the missed problems, which is why we always test in more than one way.)

I would expect a similar result from any scraping tool. There would be a lot of false positives, to the point that it would take more time to go through the results than it is worth, and many (if not most) users of such a tool would skip most of the results, missing valuable information.

But it could be useful for somebody. I would just be sceptical.

Thinking back on the code-review, I do wonder if copy / paste code happens more or less when people are doing pair or mob programming. I have no experience in those techniques, so I wouldn’t know.

The little code smells are dangerous, as they are frequently ignored. It is also probably a big reason why the original issue wasn’t detected for a decade or more. Just one line of code which looks correct wouldn’t trigger my “code smell” reflex, except that I didn’t understand how it works when I first read it. So unless the reviewer understands or questions every line of code, things WILL GET MISSED. And even if there is something which triggers the “code smell” reflex, once it gets ignored once, the reviewer will never question it again. (… Cue story about how I lived down the road from a pig farm once upon a time and wouldn’t notice pig smells for years afterward…)

Yep, a tool that scrapes the web for copies of code would only work if it could include some ML savvi to reduce the noise. I worked at a company with a huge test team where we started to experiment with using ML to parse product logs and point out anomalies. It never quite worked as well as it could have mainly because the tool took a lot of actual elapsed time to learn. Much like humans do when scanning logs to find causes of a crash or a customer unhappiness. We need to have seen many healthy logs first, so we know what counts as unusual.

Brian I am so going to try every move in the book to segway into pig farm stories, if we ever do meet in real life.

1 Like