08 November, 2011

The Return Of Dendrograms

What is Dendrogram-Based Testing? Well, what is a dendrogram to start with?

A dendrogram is a tree diagram that visualises hierarchical clustering. If that didn't help, a dendrogram basically groups objects in a tree view based on how similar they are. The closer the objects are drawn, the more similar they are.

Thanks for the maths lesson, but how is that useful in testing?

Good question. I'll come back with a final conclusion later in this post, but I can think of two uses for dendrograms:

Clustering defects: Visually show how similar the defects previously found are.

Clustering test charters (test cases): Visually show how similar planned test charters (or test cases) are.

In order to create dendrograms we need the objects, e.g. defects, to have such properties that we can measure distances between them. This is where it starts getting tricky - how do we measure the distance between two defects? The simplest thing to do is to think of properties we believe to be important and then assign them numeric values.

One example could be the property "User" (P1) and we could assign a defect a value between 0 and 5 for this property depending on how affected we think the user is by this bug. Another property could be "Performance" (P2) or "Business" (P3). Imagine we are testing a web shop and have two defects:

B1:The "This is a gift" checkbox is missing in the GUI.
B2:Memory issue that slows shopping down when you have more than 10 items in your cart.

Each of the two bugs have the properties P1, P2 and P3, and we might to assign values as follows:

B1: P1=5, P2=0, P3=2
(the user is affected, the performance is not but the business flow is also affected)
B2: P1=3, P2=5, P3=0
(some users will be affected, the performance is affected, the business flow is not affected)

Based on these properties we can now see how similar the defects are in a dendrogram. In my earlier post I explained how to create a defect dendrogram with simple example, and I'm not going to repeat that.

Similarily we can assign test charters or test cases properties and create dendrograms. Here properties could be which actors, functions or areas that are involved, and the dendrogram shows a kind of test coverage. If all test charters are grouped together, they test very similar things.

So how do we base our testing on dendrograms?

A defect dendrogram would of course be used to decide where to focus testing. I think isolated defects would be my priority. A single defect far away from all other defects seems too unlikely, maybe there are more hiding that need to be discovered. Then again, if a large number of defects are very similar there is reason to believe that area requires special attention.

A test charter dendrogram would of course be used to help decide which charters to add. A single isolated test charter might be ok for a low-risk area, but might also be a warning flag.

Is this useful?

I have some serious doubts. Firstly, we need to find useful properties and assign them subjective values. The dendrogram will be based on those values and nothing else, so there is a huge risk of bias. Secondly, I have yet to find a good tool to use to draw dendrograms. With more than three variables (defects/test charters) and two or more properties it cannot be done by hand. Of course, writing your own tool would not be too complicated.

Right now I don't think the value gained outweighs the effort needed. I'm very interesting in hearing arguments that I'm wrong though.


  1. While it would be interesting, I also doubt the value of such a grouping, compared to the effort required to determine the relevant properties and ensure that each defect/charter were fully, consistently, and correctly encoded.

    For some high-value, high-risk, and well-funded projects, this would make an interesting study. For the kinds of business projects I work on, I don't see that I could get the time and funding needed to try it.

    For my projects, we tend to use a lighter, less-formal analysis ("which modules generated the most bug reports?"). Even then, the answers are less than definitive, perhaps because there always are so many variables left out of the analysis ("for this module that generated the most bug reports, did the fact that the best tester was assigned to it influence the bug count?")

  2. Reply to Joe:

    Thanks for your comment.

    No, I think it is too time consuming, and more importantly it is very hard to think of an objective way to determine useful properties. It was an intesting thought experiment but I will not work any more on it I think.