How does testing change when 30%+ of code is created by AI?

With claims that an increasing amount of code is written by AI, what does this mean for testing?

Here is one such reference that Microsoft is generating 30% of code with AI.

I’ve heard community commentary that testing will become more important in the age of AI, but I’m yet to hear of real tactical strategies and modern thinking around how people are actually doing this.

  • How is testing different in this age?
  • What strategies are being adopted?
  • What tools are we really using?
  • What problems are we finding?
  • Are the types of problems changing?
  • In the age of speed, how can we advocate for risk and safety?
4 Likes

It’s more critical than ever that tests are written by humans. If we delegate implementation and tests to AI then we are doomed to build the wrong things correctly (vs did we build the right thing correctly?)

“Did we build the right thing” will forever be a human question.

5 Likes

If AI is generating 30% of the code, you probably need to do 100% more testing to achieve the same quality. But that’s not going to happen. I expect product owners will want to keep all the time saved by using AI to write the code, and they will not want to spend anything on extra testing. If anything, they will use the time saving to generate even more code.

Call me cynical, but I don’t expect product owners to be the slightest bit receptive to our arguments about the increased risk. They have been happy to see software quality decline continuously over the last 15 years or so, so I don’t see why the risks associated with AI coding would change their mind. Releasing new features always trumps quality as far as they are concerned.

Of course there will be a few exceptions to this, but it will be the norm. And some might have a change of heart the first time something goes seriously wrong. But more likely, they will just blame the testers and declare that it won’t happen again despite doing nothing to justify that claim.

2 Likes

Understanding the code is more important than ever before. If someone used the code generated by AI without understanding the impact, things would be worse.
When developers code, they know how functions, classes, and objects are connected and data is flowing between them, but in the case of AI, if developers don’t check the code, then maybe software may run, but there might be some issue somewhere else in the software as part of the impact which may or may not surface immediately but will eventually.

So this is one crucial thing, and because of this, unit testing needs to be done more thoroughly. Apart from that, testers would also have to prioritize regression testing and do white box testing wherever possible, at least for the AI generated code.

And as far as risk and safety is concerned only way is awareness, which can be done through meetings, brainstorming sessions, giving talks and presentations on it, highlighting it as one of the most crucial aspect of application from security point of view.

2 Likes

From a black box tester’s point of view nothing would change, wouldn’t it? You still would have to test the software where it doesn’t matter who built it. As for testing closer to the code: it depends on how the AI writes eg. unit tests. Are all important functions written in a test? What is the test coverage?

3 Likes

I suspect this will take a few more cycles to work out.

Developers I work with are using AI tools and there is a productivity element that I’m already seeing alongside it some indications of increased automated coverage.

So far any downsides have not been standing out, I caveat this with the view that almost all the developers I work with are senior so even if AI is assisting they will still be reviewing and understanding any code that gets into the build. This bit is key to my context, if this was in doubt I suspect higher levels of change in strategy.

The use of AI I hope will reduce my time spent on automation, already being leveraged in script generation alongside the increased coverage by developers. I have been light on this though so others more focused on this area are better to give feedback.

For hands on investigative testing this is the area I think it will take a bit longer.

Example - I often use allocation based estimates often around the team size and product complexity, even if all other things remain equal but developer productivity increases this will mean I need to review my weekly allocations.

We’re building products quicker with same resources but does the hands on testing effort remain the same?

Is there a risk testing becomes a bottleneck as a result? I’ve had strong views that if testing is ever a bottleneck you are doing it wrong. This is something though that I do feel will need adaption to make sure this continues to be true.

My suspicion is that I’ll over time need to move some of my risk investigation down to code level as more tools become available for scanning for risks at code level. An example could be accessibility scanners at code level with the scans adding efficiency to my testing.

Even if I do not see the issue of the tools being in the wrong hands element, its still going to change things significantly in my view.

Whether my testing efficiency improvements match those of developer remains to be seen, but seems a reasonable starting goal.

I’ll be watching for signs of that bottleneck very bad case scenario (unlikely the worst) and looking for ways to avoid this.

3 Likes

That side angle expectation element of everyone being x times more productive yet paid the same salary as before is going to impact the market and potentially testing community as a whole.

1 Like

If you let how the code was generated influence how you test, I think thats a dangerous game. Efficiency is a demand on testing regardless of AI. The objectives of testing have not changed, only the tools you use to achieve those objectives.

1 Like

AI is not that intelligent it an interpret relationships, how many of you have googled for code or coding solutions to resolve a test automation problem. There aren’t that many ways to interact with the O/S, web services and/or it’s functionalities .

Testing stays the same, you still need to interpret the results from AI, if it looks right, it doesn’t mean it’s right since it’s based on the language models

1 Like

I think some things won’t change, but other things will.

For example, some problems may be more common with AI and we will learn how to look for them better. Hallucinations and the distrust of data are a couple of generic examples.

2 Likes

Agree, code was always coming into “the product” from sometimes random places and in random ways, so having some not insignificant inflow from a monoculture is a risk in itself, but we have to defend in much the same way we always have had to. Dig deeper with tools that go deeper, great coverage of the security concerns, Andrew.

1 Like

From a black box tester’s point of view, the risk is substantially higher so you should advocate for more time and budget. You won’t get it, in which case you should explain this in your gap analysis or risk analysis or whatever it is you do at the end of a project or test phase to explain the remaining risks to the stakeholders.

Indeed. Developers are already reporting AIs inventing methods that don’t exist in the programming language they have been told to use. The compiler should pick up this particular fault, but I am sure that many other types of fault will not get detected before they get through to the testers.

It’s well worth reading developer reports on their real world experience of using AIs for coding. Many such reports are posted every day on LinkedIn.