By “self-testing” systems I mean systems that use the functionality in the system to help test the system. Often this means in helping to determine what the correct result should be. This is a little different from what we tend to do for example in writing unit tests, where we work out what the result should be by some other means (working it out in “our head” is a common approach !)
Using existing functionality to help can sometimes lead to very easy and efficient ways of testing things and ensuring quality. I’ll just give a few examples of some of the situations we have had at ABS.
As background - a quick summary of how one of our systems works.
In processing our data, we have a number of “edits” which step through each record and check for errors or inconsistencies. If a “trigger” detects an error, either some “action” is automatically applied to fix it, or a screen is shown which allows a human to select an appropriate action.
So, some of the features we have used to help make this self testing are :-
Reapply the trigger after the action has been applied
Once the action has been applied, hopefully the error should be fixed ! If the edit still triggers after the action has been applied - then the error hasn’t been fixed. We can see straight away whether this is the case or not. So the system continually checks the edit after each action, and only moves on if the edit no longer triggers. This is something that helps with manual testing - in effect it allows every use of the system to be a test where the user can quickly see if something is not working correctly. If the same edit keeps occurring, then we know the action is not fixing the problem, and something is wrong. To emphasize that point, the gain here is that we don’t just have a separate testing phase where we look for errors, every use of the system in production effectively becomes a test.
This is a very simple and obvious example, and trivial to implement, but provides value.
Built in scripting capability
Each time this system runs, it records everything that happens, ie which edit triggered on which record, and which action was taken to resolve it. The system can then be rerun with the script file that was produced (and the same input data) and the system will then follow the script and do the same actions as originally recorded. If different edits triggered then an error is reported.
This allows us to easily set up regression tests from any run of the system, whether it was a test run or a real production run. For any run, we can take the input and output data files, and the script, and rerun the system and compare the outputs. (We have another system where we can store all these files and automatically rerun them.)
Implementing this was a minimal amount of work, as the system had a kind of MVVM (Model-View-ViewModel) design which kept the screens and logic separate, and so replaying a script file could directly invoke the appropriate screen functionality, without need of actual user interaction with the screen.
A kind of cool thing in implementing this was that it is easy to check if the scripting capability itself is working correctly, because in applying the input script, the system is at the same time generating a new script (as always, it is recording exactly what it is doing) which should be effectively exactly the same as the original script (apart from timestamps etc). If there are differences, then somehow the input script was not replayed exactly. A comparison of input and output script files shows us whether it worked. Just a nice little example of self-testing software
Again the gain here is that just using the system is in effect creating a test - we don’t need a separate scripting tool to record/playback a session.
As part of the above work to implement scripting, it became trivial to implement a “random” mode, where the system randomly chooses an action to take in response to any edit that is triggered. Given some input data, we can let the system randomly explore and choose any available action. Eventually (hopefully !) the data is all fixed and the session ends. (Or else it may have crashed along the way, which is a good result because now we definitely know there is a bug.) We can then check the output data for correctness - we have some automatic validations that can be applied, or we can manually inspect the data.
This lets many more paths through the system be explored than we have time to do manually. It still remains a fairly manual and laborious task to check the output data if the automatic validations don’t find any issues. If we do find issues, we can then fix the system (and add more auto validations).
Test data generation
I think this deserves a topic on its own ! - but will briefly describe an approach we are trying here.
Because our business logic is specified in a DSL (domain specific language) for which we have tools to parse and generate code from, we can use our business logic as constraints which we may (or may not) want our test data to comply with. For example, if a trigger for an edit was something like “person’s age is under 10 but they are married” (just a made up example) then we can use this to automatically generate data which definitely triggers the edit, eg by assigning random ages and if the age is under 10 then make them married. Alternatively we can do the opposite and generate data which will not trigger the edit. Or, generate different persons with ages 9, 10, 11, etc. Of course, this is just one particular condition - and we have 100s or 1000s or conditions which relate to each other. So in the end we potentially want to apply all these constraints to end up with data which satisfies certain conditions, and breaks others.
This could lead to us slightly extending our existing DSL to allow easier specification of test data to be generated.
I know I haven’t explained this very well here - I could go into more detail and give a better explanation if anyone is interested, in a separate post. But, I did want to include this here because it’s a bit of a different example of using existing functionality to help test itself, ie taking the business logic formally defined in a DSL, plus the tools to work with that DSL, and repurposing that to generate test data conforming (or not) to those rules.
Anyway, that’s a few examples of some of the things we have done which have been useful for us. Finding these sorts of opportunities can depend very much on the individual system - so I’m definitely not saying that these examples apply to anyone else’s systems. Just from my own experience though I’ve found it worthwhile to look for and make these opportunities - because if you can find them, they can be invaluable. Of course, on the other hand, if it simply is too much work to include appropriate self-testing functionality, then the cost of doing so might not be worth it.
We could probably come up with a list of general techniques, eg
Round tripping - where some functionality is effectively the reverse of another, and after applying both we should end up with the original (in some form)
Idempotency - we should expect the same result however many times we apply the operation. So regardless of knowing what the result should actually be, if we know that it should be the same we can check for that
Am keen to hear what other people have done in this kind of area
Or, if you have some particular situation that you need to test, maybe people could come up with suggestions of how to make it self-testing ?