10 November, 2011

Follow-up on xBTM


At STARWest 2011, I gave a talk about xBTM together with Michael Albrecht. Jon Bach was in the audience, and he gave us some very valuable feedback on our idea and how we presented it. This blog post is a follow-up on our conversation in Anaheim.

Recently I was testing version 1.5 of AddQ's SBTM reporting tool SBTExecute. The team wanted to release the new version as soon as possible, and I was travelling so there was not so much time for testing. We agreed that I would spend one morning - four hours - testing the final version. My method of choice was of course xBTM. In this blog post I will go through how I spent those four hours, and what was the outcome. For more background on xBTM and the tool, please refer to my earlier post.

Test plan

My initial step was to make a mind map test plan. My preferred (free) mind mapping tool is XMind. I opened a new mind map and placed my application under test (SBTExecute) as the central topic. Next I identified my key areas (also known as function areas). I spent a couple of minutes thinking about some obvious groups and came up with:
  • Configuration
  • Documentation
  • Running tool
  • Import
  • Generation
  • Report
I added these six key areas as subtopics in the mind map. Then I decided I also wanted to some stress testing and added that as a subtopic too. See Figure 1.

Figure 1: Mind map test plan. The central topic (SBTExecute 1.5) is the software under test. The subtopics are key areas, or test techniques, and grouped under the key areas are the test threads.
Then I spent about twenty minutes thinking about test ideas, or test threads, and writing them down in the mind map under the appropriate key area. Since this was not the first time I tested the tool, I already had some ideas of things to test. After a total of maybe half an hour or less, I had my test plan, see Figure 1.


I decided to start by testing Import, since that is the key feature of the tool, if you cannot import data from the session reports nothing else is really worth testing. I looked at the threads I had listed under the import node and figured that I could probably test all of them in one 45 min session. To show this in the mind map, I changed the colour of all those threads to blue, see Figure 2.

Figure 2: Testing all threads under the key area Import in one session.
I write my session reports using the SBTExecute Excel template, and I use the mind map note functionality to connect the session report to the mind map. The yellow paper icon on the subtopic Import shows that there is a note, which can be viewed and edited by clicking the icon, see Figure 3.

Figure 3: Adding notes in the mind map program. The note refers to the corresponding session report.
I ran the session, making notes in the session report. When I felt I was "done" testing a specific thread in that session, I visualised that in the mind map by adding a green check icon to that thread. Once in session, I realised that I didn't really want to test the thread Incorrect data, since there is data validation in the Excel template. I decided to pause that thread for now, and maybe come back to it later if I had time. Hence I added the pause icon to that thread. I also added some notes on why I paused the thread, see Figure 4.

Figure 4: Pausing the thread Incorrect data.
Then I realised I had forgotten to prioritise the key areas, so I did that using the colourful number icons in the mind map program. The highest priority is 1 and the lowest 6. See Figure 5.

Figure 5: Adding priority.
It turned out that the tool only reads files in xlsx format, and not old xls files. I wasn't sure if this was a feature or a bug, so I marked the thread with a question mark and made a note of it, see Figure 6.

Figure 6:  When there are questions, the thread is marked with a question mark and a note is added.
I ended the session, but the fact that xls files were not read made me curious so instead of starting a new session I decided to look at the configuration key area for a while. I spent a few minutes trying to create session reports in Office 97-2003 and Open Office and have the tool read them. These two threads were tested as threads rather than in a session. I only spent a few minutes on them since it was a low-priority area, made a short note, then paused the threads and decided to come back later. If I were to resume these threads (in the end I never did), I would keep making notes in the notes window, see Figure 7.

Figure 7: Testing two threads for a short time, then pausing them.
Next I decided to test all threads under the key area Generation in a session, the same way as I did with Import, see Figure 8.

Figure 8: Testing all threads under the key area Generation in a session.
Here I found a few defects, which is illustrated by the red X in the mind map. Whenever I found a bug, I made a note in the mind map of the ID number and added a short description. Of course this information was also added to the session report, see Figure 9.

Figure 9: Defects found are marked by a red X, and a note of the ID number is added.
After completing the Generation session, I started looking at the Report key area. Here I decided that the two threads Iteration Reports and Summary Reports were extensive enough to make up a session, see Figure 10.

Figure 10: Testing two threads in a session.
After running three sessions, and testing two threads seperately I had the following status in my mind map, see Figure 11.

Figure 11: Test status after running three sessions and testing two threads seperately.
The documentation for the tool consists of three manuals, and I felt they were best tested as threads. At this point I was running out of time, and decided to simply quickly skim through the manuals. I used the partially filled square icons to visualise how far I felt I had gotten with the manuals and made short notes in the mind map (no session report since I tested them as threads). The final couple of minutes I decided to spend on testing running the tool with correct parameters, which I also tested as a thread since there was not enough time for a session, see Figure 12.

Figure 12: Testing threads, using the partially filled squares to visualize progress.
Test report

Finally my four hours were up and I stopped all testing activities. At this stage I had a test report in the shape of the latest version of the mind map, see Figure 13.

Figure 13: Test report.
I also had three session reports (Import, Generation and Report) and an error list.

Given more time, I would have returned to the paused or partially tested threads and continued, adding notes in the notes window. I would also have spent some time on the previously untouched threads. Due to the very limited time in this case, the above story is not a very good example of thread-based testing, but I hope to have one soon.

08 November, 2011

The Return Of Dendrograms

What is Dendrogram-Based Testing? Well, what is a dendrogram to start with?

A dendrogram is a tree diagram that visualises hierarchical clustering. If that didn't help, a dendrogram basically groups objects in a tree view based on how similar they are. The closer the objects are drawn, the more similar they are.

Thanks for the maths lesson, but how is that useful in testing?

Good question. I'll come back with a final conclusion later in this post, but I can think of two uses for dendrograms:

Clustering defects: Visually show how similar the defects previously found are.

Clustering test charters (test cases): Visually show how similar planned test charters (or test cases) are.

In order to create dendrograms we need the objects, e.g. defects, to have such properties that we can measure distances between them. This is where it starts getting tricky - how do we measure the distance between two defects? The simplest thing to do is to think of properties we believe to be important and then assign them numeric values.

One example could be the property "User" (P1) and we could assign a defect a value between 0 and 5 for this property depending on how affected we think the user is by this bug. Another property could be "Performance" (P2) or "Business" (P3). Imagine we are testing a web shop and have two defects:

B1:The "This is a gift" checkbox is missing in the GUI.
B2:Memory issue that slows shopping down when you have more than 10 items in your cart.

Each of the two bugs have the properties P1, P2 and P3, and we might to assign values as follows:

B1: P1=5, P2=0, P3=2
(the user is affected, the performance is not but the business flow is also affected)
B2: P1=3, P2=5, P3=0
(some users will be affected, the performance is affected, the business flow is not affected)

Based on these properties we can now see how similar the defects are in a dendrogram. In my earlier post I explained how to create a defect dendrogram with simple example, and I'm not going to repeat that.

Similarily we can assign test charters or test cases properties and create dendrograms. Here properties could be which actors, functions or areas that are involved, and the dendrogram shows a kind of test coverage. If all test charters are grouped together, they test very similar things.

So how do we base our testing on dendrograms?

A defect dendrogram would of course be used to decide where to focus testing. I think isolated defects would be my priority. A single defect far away from all other defects seems too unlikely, maybe there are more hiding that need to be discovered. Then again, if a large number of defects are very similar there is reason to believe that area requires special attention.

A test charter dendrogram would of course be used to help decide which charters to add. A single isolated test charter might be ok for a low-risk area, but might also be a warning flag.

Is this useful?

I have some serious doubts. Firstly, we need to find useful properties and assign them subjective values. The dendrogram will be based on those values and nothing else, so there is a huge risk of bias. Secondly, I have yet to find a good tool to use to draw dendrograms. With more than three variables (defects/test charters) and two or more properties it cannot be done by hand. Of course, writing your own tool would not be too complicated.

Right now I don't think the value gained outweighs the effort needed. I'm very interesting in hearing arguments that I'm wrong though.