Tuesday, November 5, 2013

A Heuristic Approach to Generating “Good Enough” Weighted State Transition Probabilities


It started as a joke. Having recently watched the Big Bang Theory episode `The Herb Garden Germination` and reviewing too many resumes listing the same buzz words over and over, I had an idea: create some term or concept, inject it into the wild with enough backing information to make it sound legitimate, and see if it ever made it back to me in a resume and, if so, how long it took to get back to me.

I needed something in an area that was being used where I worked as well as in enough other environments to be feasible, yet wasn't widely popular. The perfect candidate seemed to be model-based testing. So, I came up with the concept of Cross-Matrix Defect Analysis – multiplying a state transition matrix by a matrix of known defects to get a sort of weighted state transition matrix. I worked up a few formulas, wrote them on a whiteboard in a prime location at work, and recruited colleagues to help me plant the seeds so that when someone asked, “What’s that?” they could respond, “Oh, that’s something that Michael is working on for our model-based testing called Cross-Matrix Defect Analysis.”

But the more that I thought about it, the more I realized that there was actually something to this Cross-Matrix Defect Analysis, something beneficial to our model-based testing framework. We could rework the idea a little, substituting a state adjacency matrix for the state transition matrix, do a little matrix multiplication and row-normalization, and come up with a fairly quick and simple way to generate a state transition matrix based on some measured quantity, such as defect populations.

The typical adjacency matrix, represented here as \(A\), is a \(nxn\) matrix (where \(n\) is the number of states in the model) where the entry \(a_{ij} = 1\) if state \(i\) is adjacent to state \(j\), and \(0\)  otherwise.

If we let \(B\) be the \(nxn\) matrix representing some measured quantity with respect to the application, such as the number of known defects, where the entry \(b_{ii}\) represents the frequency of the measured items present in state \(i\), then we have a diagonal matrix (entries only on the diagonal of the matrix).

If we then multiply the two matrices \(A\) and \(B\) we get another  matrix, \(C\), which is an adjacency matrix that has been weighted with respect to the measured quantities:
$$C = [A][B]$$
If we then compute the matrix \(C'\) by performing row-normalization on the matrix \(C\), letting \(c'_{ij} = \frac{c_{ij}}{\sum_{j=1}^{n}c_{ij}}\), then \(C'\) will be a stochastic matrix where \(c'_{ij}\) can be interpreted as the probability of transitioning from state \(i\) to state \(j\) weighted by the frequency count associated with state \(j\).

However, this method of generating a state transition matrix can result in unreachable states when the measured quantity for one or more states is zero. For example, if we are using defect populations for weighting and no defects have been identified for the login screen, then the probability of reaching the state representing the login screen would be zero, meaning that the login screen would never be reached. In many cases this issue can be overcome by applying the constraint that each state in the model must be reachable, and requiring the frequency count for each state to be greater than or equal to one. This can be addressed by incrementing the count of each \(b_{ii}\) entry by one, which can be accomplished by adding the identity matrix, \(I\):
$$C = [A][B + I]$$
This also causes an issue because we have compromised the accuracy of what we are using to weight our state transition probabilities. But is the solution it provides good enough to solve the initial problem we’re trying to solve? It does cause a perturbation in the values (TheObserver Effect) used to generate the transition probabilities, but does that really matter? It’s often the case that the degree to which the counts are affected can be considered negligible or minor. For example, when referring to defect populations, are we counting all defects or are we are counting known defects with the understanding that there may be one or more undiscovered defects? If it’s the later, the incrementing our count by one could be OK. In other cases what we are measuring may be a highly-subjective estimate, such as expected traffic through a particular function of the application, or perceived risk.

The point is that we often apply heuristics to help us establish probabilities (not certainties) of execution flow through a system. Generating weighted state transition probabilities following this method is simply another application of a heuristic – it yields an approximate solution which may be considered good enough in some contexts if we are willing to exchange optimality, completeness, and accuracy for an approximate solution that we calculate quickly.

If we are willing to accept these trade-offs, then we can then expand the result, and let the matrix\(B\)  denote any \(nxn\) matrix that represents any known, estimated, or heuristic measure, such as
  •          Defect Populations
  •          Defect Injection Rates
  •          Business Criticality
  •          Function Points (complexity)
  •          Application Use (traffic)
  •          Application Change

The application of this method has many benefits, including producing a good enough solution quickly enough to solve the problem. Furthermore, using model-based testing and any quantifiable measure that can be applied to each state of the model, we can generate a transition probability matrix which may then be used to automatically generate test cases that are statistically directed towards areas of the application affected by that measure.  It also means that we no longer need to guess what the state transition probabilities are, which in turn implies that the reliance on domain knowledge to generate transition probabilities can be removed, thereby eliminating a bottleneck and point of failure.


By the way, I’m still waiting for Cross-Matrix Defect Analysis to show up in a resume.

Saturday, October 19, 2013

My First Tracking Class - A Lesson Learned in Context

My wife and I took the Ridgeback for her first tracking class last Saturday at the Oak Ridge Kennel Club. It was basically an introduction to tracking for the handlers, with a short hands-on exercise with the dogs to wrap things up. The exercise was not to have the dogs track at this point; the instructors wanted to get the dog to realize that there is something “out there” for them to find, and when they do find it they hit the jackpot with treats galore, exuberant praise, and playtime. The idea is to get the dogs to follow a trail of food from a starting point with one article, usually a sock or a glove, to the end where a matching article is located.

We laid the track by using a survey flag to mark the start of the track, and we would place the first article on the ground, step on it, and place a piece of food on top of it. We would then walk twelve to fifteen steps, place another survey flag, get the dog’s attention, wave the second article around while they were watching, then place it on the ground, step on it, and place a good treat on top. The handler then walked back to the starting flag, making sure to follow the exact path they took out, dropping food at every step, in effect creating a double-laid track for the dog to follow.

The handler then retrieved their dog, took them to the start of the track, and attempted to keep them on the track while the dog sniffed out the food that had been dropped, all the while staying behind the dog so that it had to work out what the task was. If the dog was having trouble finding the food, the handler was allowed to step up with the dog and help them find it. Once the dog reached the end of the track they were rewarded with more treats, and had a short play session with the handler, then another, slightly longer track, was laid and the exercise was repeated.

What I found interesting was how the dogs approached the task differently. One dog was so interested in getting to the end of the track that it passed up treats that were lying in plain sight. Another showed no interest in the treats at all, and was content to just sit at the start of the track until they were led from treat to treat, finally reaching the end of the track. The third dog would find one treat, and then begin searching to the side of the track until the handler directed it back to the track. The fourth dog started on the track, found each piece of food that had been dropped, until it reached the article at the end of the track.

On the drive home from class it struck me how important context was to understanding each situation. At this point it wasn't that some dogs were better at tracking than others, although that may prove to be the case; it was that the context was different for each dog. Since they knew nothing about tracking at this point, and had yet to figure out why they were there, each dog had a different objective in mind.

The dog that bypassed all the treats to get to the end was a younger dog that had been playing with his handler and the article they had brought, so his context was that he got to play some more when he got to the article. The dog that showed no interest in the treats and had to be led to each one had just been feed, wasn't really treat-driven, and had no real interest in finding the treats laid out on the track. The dog that kept going off track after finding a treat was distracted by one of its owners sitting off to the side, and so it kept trying to get over to them. The last dog that went from treat to treat was very food-driven, wanting nothing more than to find another treat.

My takeaway from this is that context-driven testing applies to more than just testing; it applies to the attitudes of the testers, as well as to managing those attitudes and testers. As a test manager I need to not only be aware of the attitudes of my testers, but I need to understand how context affects and shapes their attitudes. I need to be aware of how I can best work with them, assist them, and guide them in that context. After all, people, working together, are the most important part of any project’s context.

Friday, October 11, 2013

Using Twitter to Reach a Larger Audience

It's time for Techtoberfest at work – a technology summit held every October that provides an opportunity for employees from all over the company to sign up and present on a technology of their choice. It’s traditionally been very developer-heavy, and submissions from my team are often rejected with a quick “we already have a presentation about testing, we don’t need any more.” But there was a changing of the guard this year, and a new group was asked to take it over and revamp it. Seeing an opportunity to beat our own drum and show off the hard work the test automation group had put into completing the model-driven automation framework, I asked the automation lead to submit a presentation.

As we were brainstorming about the presentation, we came up with the idea to use the AutoTweet PowerPoint add-in to tweet the talking points as the presentation progressed. We created a Twitter account to use (@ScrippsQA), installed the add-in, and got to work on the presentation. It wasn’t until a week or so before the presentation was to be delivered that we learned the organizers wanted everyone to use the same laptop to avoid glitches, snafus, and keep things rolling smoothly. While not a show stopper, we knew that we would not be able to install the AutoTweet add-in on that machine, so automatically tweeting the talking points was out. Not wanting to totally abandon the Twitter idea, we decided that we could manually send the tweets during the presentation.

There’s nothing groundbreaking in doing this, but we thought it might generate more involvement or interest in the presentation, and provide an alternate method for people to participate if they were unable to actively attend. Also, because of the rapid growth of social media and online collaboration tools and their acceptance in the workplace, it gave us the opportunity to show that we’re proactive and ahead of the curve; we’re not your parent’s QA department. Going beyond Techtoberfest, we could see continued use of Twitter as another viable means of communication with both our teams and our customers.


With that thought, I’ll leave you with the Storified version of the tweets we made.


Friday, September 27, 2013

The Macro Expansion Heuristic – A Real World Example

I enjoy learning new things, and I especially enjoy making connections between things I've learned or read about, and applying them to real-world situations. I had the opportunity to do that today while I was brainstorming with a colleague for a presentation on model-based testing and its application in intelligent automation. We were talking about the benefits that model-based testing provided when my colleague said something that caught my attention: we are modeling requirements.

I tend to be overly careful in conversations about model-based testing because my experience comes from modeling the states of use of an application and not the approach we’re taking with model-based automation, and I don’t want to use the incorrect context. So I stopped and thought about it for a few seconds before telling my colleague that I wasn't comfortable with that statement. Something about it didn't sit right with me, and I suspected it was that the statement either did not correctly say what was meant or did not mean what was said.

We talked it through and mapped it out in our mind map, but were still unable to agree that the statement was correct, but by now I believe that my colleague was also unsure as to whether or not he had said what he meant or meant what he said. It then occurred to me that I had seen a similar problem before, and that problem had been presented along with a heuristic that we could apply to solve the problem – the concept of a macro expansion, which I had first seen applied to communication by Michael Bolton in his blog post “What Do You Mean By ‘Arguing Over Semantics’?

When we generalize on the concept of a macro expansion we can utilize it to take a single word and expand it into a series of words that more accurately express what we mean. I asked my colleague if we could do a small exercise on the white board to sort this out, and he agreed. Taking one of the markers present I wrote on the board:

“We’re modeling the requirement.”

As I was writing I told my colleague that if we were to expand that statement, we would get something that better said what was meant. So, under that I wrote:

“We’re modeling that the requirement was correctly implemented.”

After further discussion we agreed this could be shortened without a loss of clarity, so I wrote:

“We’re modeling correct implementation.”

We were both comfortable with that statement, and felt that it truly expressed what we were attempting to say. 

Having the feeling that we were on to something, we took this one step further. If we’re modeling correct implementation then any discrepancy between the actual implementation and what we have modeled means that there is a problem. That means that the models we are creating are mechanisms by which we recognize a problem. That sounds rather familiar....


Thursday, September 12, 2013

Takeaways from Our First Lean Coffee

I've noticed over the past several months that the team’s Monday morning weekly kick-off meeting was becoming more and more ineffectual. It originally began with the idea of getting team members interested and involved in the projects and tasks their team mates were working on, allowing the opportunity to meet, provide input and insight, and take a break from their own projects. What it degenerated into was a high-level review of Friday’s weekly report. It wasn't quite as bad as some of those daily stand-ups I've attended where team members point at the board and mumble “Yesterday I worked on that. Today I will work on this. No obstacles.” However, it was certainly getting close. We would occasionally find the rare nugget of information that wasn't reported on Friday, but it was certainly failing to meet its initial charter. It had been my idea, and I had the responsibility to either make it work or cancel it.

I had spoken with the team a few weeks ago, asking if they wanted to cancel the meeting, citing a lack of ROI and better use of their time, yet each one voted to keep it going. I gave it a little more time, noticing that they still really weren't that into it. I kept going back to the juxtaposition of the vote to continue holding the meeting and the apparent lack of engagement and effective communication in the meeting. That, to me, signaled that the team really wanted to communicate, but the outlet provided by the meeting wasn't the proper outlet for that. But what was? What was lacking in this meeting?

I tried to think of other, more productive uses of everyone’s time. I toyed with the idea of making the Monday morning rounds to their desks to talk about their projects, taking up only as much of their individual time as needed. That sounded like a pretty good use of everyone’s time, and would allow me to get the information I was after, but it didn't allow them to interact or perceive new ideas. I kept trying to think of other ways to get engagement from the team, but was unable to identify any viable solutions. Until CAST2013.

I was watching the twitter feed during CAST2013, and started seeing tweet after tweet about the Lean Coffee. Lean Coffee? Really? I didn't recall ever having heard about Lean Coffee before, so I did a little research and quickly became intrigued. Lean coffee sounded simple. It sounded relevant. I asked peers if they had any experience with holding a Lean Coffee, and received some very favorable responses.

Last week was the last time we held our status meeting. I canceled the meeting in Outlook and scheduled the Lean Coffee for the same time slot and place, providing a link to leancoffe.org and a YouTube video that talked about the mechanics while showing a Lean Coffee in progress. I was expecting a slow start with a smattering of topics provided as the team figured out how everything worked without an agenda, what we were going to talk about, even what we could talk about.

This week, we started our Lean Coffee. This week, the results were truly outstanding. The team was engaged, they provided excellent topics, they discussed each topic excitedly (often voting to continue the discussion), and they voluntarily extended the meeting an additional thirty minutes to address additional topics. When we finally concluded the meeting, we had resolved several topics, identified three longer-term action items to be addressed outside of the Lean Coffee, modified the board to better address the longer-term action items, and everyone left that meeting still talking excitedly, including me.

I knew that the Lean Coffee had hit on something, but I was unsure about what, exactly, that was. Later, after I had time to pause and reflect on our first Lean Coffee, I realized that there were several factors that made our move to Lean Coffee a success.

Although I had made the connection between the vote to continue the Monday meeting and the continued ineffectiveness of the meeting to address its charter, I had not understood what the relationship was. Ultimately, the meeting was not valuable to the team because it did not address their needs and concerns; most meetings do not address the needs of the attendees, they address the needs of the meeting organizer. So, there we were, every Monday, using up valuable time, including a portion of the limited face-time they have with me, their manager, and I was not addressing their needs and concerns, I was addressing mine and what I thought theirs were. So, the team’s perception has been that the meeting was neither relevant nor timely to them.

A second takeaway was that the best way to get the team engaged in a meeting is to have them help generate the meeting’s agenda. The team was eager to discuss new concepts and thoughts on improving our test efforts when they were provided in context of their concerns and the issues they were currently facing because they were both pertinent and timely. Helping to create the meetings agenda also gives the team a sense of ownership in the meeting and therefore responsible for the success of the meeting. It also demonstrates that management not only values their involvement and their contributions to improving our testing efforts, but is actively pursuing their input. That, to me, is truly vital. Even a good self-organizing team doesn't want to be self-organizing in isolation. They welcome and seek out management’s involvement throughout the organizational process.


These insights were just from our first Lean Coffee, and I’m very excited to see what happens next. I've already seen one team lead propose changing the team’s weekly meeting to a Lean Coffee, and another has proposed using a Lean Coffee format to present a portion of our seven year strategic plan to our management team. That’s what I call engagement!

Sunday, September 8, 2013

Doing the Math for Assessing Communication: The Bijective Oracle

I've seen some conversations and blog posts recently about whether or not we should be arguing over semantics. Some I read and follow, some I don’t, but someone recently directed the people involved in one of these threads to a blog post by Michael Bolton. The post is entitled “What Do You Mean By “Arguing Over Semantics”?, “ and I think it provides one of the best discussions I've seen on this subject. The thing that caught my attention was when Michael finishes the post by saying:
“There’s a common thread that runs through these stories: they’re about what we say, about what we mean, and about whether we say what we mean and mean what we say. That’s semantics: the relationships between words and meaning. Those relationships are central to testing work. 
If you feel yourself tempted to object to something by saying “We’re arguing about semantics,” try a macro expansion: “We’re arguing about what we mean by the words we’re choosing,” which can then be shortened to “We’re arguing about what we mean.” If we can’t settle on the premises of a conversation, we’re going to have an awfully hard time agreeing on conclusions.”
That really struck a chord with me because it highlights what I see as one of the larger obstacles to effective language-based communication: the one-to-many relationship that exists between a word (the one) and the meanings (the many) generally associated with that word. This is easily extended beyond single words to include groups of words structured to form larger constructs such as clauses, phrases, sentences, paragraphs, etc., in which case we now have a many-to-many relationship between words/constructs and meanings. There can be many disconnects between what we say and what we mean, and we often don’t know whether we actually say what we mean and mean what we say. Moreover, if we’re not sure, how can the people we’re trying to communicate with be sure?

One oracle I try to apply when it comes to assessing the effectiveness of language-based communication is the mathematical relation of the bijective function, which is a relationship that is both injective and surjective. Yeah, I’m a math geek, but bear with me while I describe this oracle.

In mathematics, a function is simply a relationship between a set of inputs called the domain, and a set of outputs, called the range or co-domain, in which every member of the domain is mapped to exactly one member of the co-domain. If we apply this to the relationships between words and meanings by mapping each word/construct to exactly one meaning within the context of our current communications, namely the meaning we are trying to convey, then we have made some progress in alleviating the issues that arise from the many-to-many mapping between words (the set of inputs) and meanings (the set of outputs), restricting the mapping and thereby reducing it a to a many-to-one mapping. But at this point we still don’t know if we are saying what we mean and mean what we say; we just know what we mean and it has many ways to be said. Effective communication needs to use the same words to convey the same meanings.

The problem is that we haven’t addressed the fact that different words/constructs can be used to convey the same meaning. To do this we need to apply the oracle and determine if every meaning is mapped to at most one word/construct. In the language of mathematics, we would say that our relationship would then need to be injective, or one-to-one, so that not only is every input mapped to one output, but also that every output is mapped to a distinct input; no two inputs produce the same output. So now, if we use our oracle to assess our communication, we should be able to better see if we have established a one-to-one mapping between what we say and what we mean. If not, then there is a possibility that we are not saying what we mean and meaning what we say.

For our communication to be truly effective, we need to make sure that every meaning has been mapped to a word/construct in our conversation. Have we left anything unsaid? Have we used words/constructs to explicitly cover all meanings we wanted to cover? Every meaning we want to convey needs to have a relationship to a word/construct we have used in our communication. This means that our relationship needs to be not only injective, but that it also needs to be surjective, so that every element in our output set is mapped to a corresponding element in our input set. If we do this, then we've worked to ensure that our communication is complete.


So, using our bijective oracle, we can look for potential problems in our communication. Is our communication injective? Have we reduced the many-to-many relationship between words and meaning down to a one-to-one relationship? Is our communication surjective? Have we said everything we need to say? If so, then there’s a pretty good chance that we are saying what we mean and mean what we say.

Sunday, August 25, 2013

Mission Statement, Definition of Software Testing, and Goals of Software Testing

Why I blog?
What’s the difference between a good tester and a great tester? I think the main thing is the ability to think for yourself and to be able to incorporate your experiences as a tester back into the context of your testing practices.  I think that if you look at the software testing community and pay attention to who has good ideas and who does not, you’ll find that the vast majority of people with good ideas emphasize their experience, what they have learned from it, and how they incorporate that back into their testing.

Writing about my thoughts and experiences in software testing provides an opportunity for me to take a critical look at what I thought about a subject, assess it in the context of experience and information gained since I first came to think that way, and then update or reaffirm my thoughts on the subject. It also allows me to share my thoughts, experiences, successes and failures with others, creating an additional feedback loop. That, to me, is one of the real benefits because it keeps my thoughts and beliefs relevant rather than dogmatic.

What is Software Testing?
So, if I'm planning to write about testing software, I guess I should start with what that means. If a software product is a solution to a problem, then software testing would be the activities we undertake to ensure that the solution not only addresses the problem, but that it addresses the problem correctly. Software testing transcends the traditional concept of just running test scripts on applications, and I think that’s what William Hetzel was getting at in The Complete Guide to Software Testing, when he said “testing is any activity aimed at evaluating an attribute or capability of a program or system and determining that it meets its required results.” Testing includes the analysis we perform on an oversimplified user story to find out what the meat of the story really is, the acceptance criteria we help establish, the conversations we have with the development team or product owner as the code is being written, on and on. I could even go a little on the heretical side and suggest that software testing is every activity that a tester performs that contributes to the dialog about the solution to the problem that we are trying to address.

My Purpose of Testing?
·         To ensure that the problem that is to be addressed is understood.
·         To ensure that the solution to the problem is understood.
·         To start and maintain a conversation about the problem and the solution with all involved parties while the solution is being developed.
·         To ensure that the final solution does indeed address the problem for which it was created.
·         To ensure that the problem is addressed in a complete, consistent and traceably correct manner.