What is defect clustering?

We have heard that when the issue is fixed for one module of application, we have to test other modules as well. Sometimes, few modules needs to be tested everytime. Why its so.

1 Like

Hello @ashugupta34480!

Multiple modules may require a review such as a regression test because of dependence on or between modules. That is, the result or output of one module is used in another module, or the data collected early in a workflow is used later in the workflow.

To understand the impact of a change in one module, modules that have a dependency may require a review to determine if the change has altered the behavior of the dependent modules.

To take this a step farther, dependency is also a testability challenge. If there is a high degree of dependency in a system, then it requires more effort to evaluate. As a tester, I want to advocate for less dependency in design reviews or during implementation. In that manner, we work with project team members to both simplify designs, reduce dependencies, and improve testing efficiency.



The thread title is about defect clustering, but the first post mentions regression testing. So I’ll give my 2c on both here…

I’ve observed defect clustering in real world a lot of times… It can happen when, for example, a developer has a limited understanding of the requirements and therefore implements a feature wrongly in several places, i.e. as a tester you would expect to find a cluster of defects in a small functional area. It can also happen on a more limited scale when, for example, a developer fails to put a certain type of validation into a new certain field on a form. As a tester you may find a cluster of defects like field allowing special chars when it should only be numbers, the feedback to the user is not great, there are no limits on the field length so any number of chars can be entered. I have found the vast majority of cluster defects during functional testing, but occasionally I’d find some defect during regression testing (see more on regression below) and when I investigate what has gone wrong, I find other defects in that same area, i.e. a cluster, perhaps because we don;t have sufficient regression tests in place to cover all the functionality.

Regression defects are what I refer to as ‘inadvertent defects’. They are, for example, defects in areas of an application that have been created inadvertently, i.e. without development ever knowingly changing anything in that section of the application. An example may be a new value being entered in a config file. It should only apply to the section of the application for which it has been entered. If something broke in a totally different section of the application and that was the root cause, then it would be a regression defect, the software has gone from working to a former state of not working, it has regressed … like the opposite of progressed.

Simply stated, regression testing is required because we cannot always forsee something like the above where the software should work like before, but somehow it has become broken (lovely neutral term :slight_smile:). Regression testing can thus help us have confidence that the software is working even in areas which have not been changed.


Defect Clustering is basically the application of the Pareto principle to software testing wherein approximately 80 percent of the problems are found in 20 percent of the modules of the applications.

There is this fact that we all are aware of that a software can never be bug free and we know that certain problems occur in software with the changes made to them for better customer experience. Once a bug is found the second thing comes in the play is, going to the root cause of that defect and identifying all the impacted areas. If we talk about a small application, it is acceptable that not too many defects may be found in a particular module. Whereas, in case of the large applications, we can expect that a particular area can have more number of bugs than any other module in the application.

Invoice module is one of the most common example for the applications having finance functionality. Invoices module itself is a huge module containing many functionality thus is very vulnerable. Like Invoices, there can be some other modules that a software application can be tensed about. This kind of flimsy features become hub of the defects in the application thus tester and developer pays more attention to them. These areas are the sensitive area for the application and most used ones by the end customers.

There can be number of reasons for defect clustering. Some of the reasons for Defect Clustering are listed below:

  1. System complexity
  2. Volatile code
  3. Effects towards implementing the change
  4. Inexperience of Development staff

It is very useful that software testing services reflect this clustered spread of defects, and targets those areas of the application for testing where the high proportion of defects is suspected to be found. However, the approach should always be not just to focus on vulnerable areas but also on remaining parts of the application. It is possible that there may be very few defects in the remaining code apart from the modules where defect clustering arises, but software testers still need to perform good testing in these areas and not let these areas feel left out.

Hope this information is helpful for you.


@defect clustering

And in addition, in my experience, there might be factors outside the development team that will also cause defect clustering;
*scope creep was not addressed (less time for large amount of functionality, vital changes late in the process, changes in other parts of the system)
*ambiguous descriptions - badly described functionality - misunderstood use cases

Defect clustering is also proven in research, so it is not only something we as testers can think of as a “truth”.


defect clustering:when a small number of modules contains most of bugs defected or show the most operational failures.defect clustering actually means that the distribution of defects are not the across of application but rather centralized in limited sections of the application.it is particularly for large system where the complexity,size,change and the developer mistakes can impacts the quality of the system and affects particular modules.while testing,which phenomenon which basically happens because an area of the code is complex and tricky.the designer often use this information when making the risk assessments for planning the test,and will focus on these known areas.