The Australian Government coat of Arms

Communities of practice

Communities of practice

"Self-testing" systems

By “self-testing” systems I mean systems that use the functionality in the system to help test the system. Often this means in helping to determine what the correct result should be. This is a little different from what we tend to do for example in writing unit tests, where we work out what the result should be by some other means (working it out in “our head” is a common approach !)

Using existing functionality to help can sometimes lead to very easy and efficient ways of testing things and ensuring quality. I’ll just give a few examples of some of the situations we have had at ABS.

As background - a quick summary of how one of our systems works.
In processing our data, we have a number of “edits” which step through each record and check for errors or inconsistencies. If a “trigger” detects an error, either some “action” is automatically applied to fix it, or a screen is shown which allows a human to select an appropriate action.

So, some of the features we have used to help make this self testing are :-

Reapply the trigger after the action has been applied
Once the action has been applied, hopefully the error should be fixed ! If the edit still triggers after the action has been applied - then the error hasn’t been fixed. We can see straight away whether this is the case or not. So the system continually checks the edit after each action, and only moves on if the edit no longer triggers. This is something that helps with manual testing - in effect it allows every use of the system to be a test where the user can quickly see if something is not working correctly. If the same edit keeps occurring, then we know the action is not fixing the problem, and something is wrong. To emphasize that point, the gain here is that we don’t just have a separate testing phase where we look for errors, every use of the system in production effectively becomes a test.

This is a very simple and obvious example, and trivial to implement, but provides value.

Built in scripting capability
Each time this system runs, it records everything that happens, ie which edit triggered on which record, and which action was taken to resolve it. The system can then be rerun with the script file that was produced (and the same input data) and the system will then follow the script and do the same actions as originally recorded. If different edits triggered then an error is reported.

This allows us to easily set up regression tests from any run of the system, whether it was a test run or a real production run. For any run, we can take the input and output data files, and the script, and rerun the system and compare the outputs. (We have another system where we can store all these files and automatically rerun them.)

Implementing this was a minimal amount of work, as the system had a kind of MVVM (Model-View-ViewModel) design which kept the screens and logic separate, and so replaying a script file could directly invoke the appropriate screen functionality, without need of actual user interaction with the screen.

A kind of cool thing in implementing this was that it is easy to check if the scripting capability itself is working correctly, because in applying the input script, the system is at the same time generating a new script (as always, it is recording exactly what it is doing) which should be effectively exactly the same as the original script (apart from timestamps etc). If there are differences, then somehow the input script was not replayed exactly. A comparison of input and output script files shows us whether it worked. Just a nice little example of self-testing software :slight_smile:

Again the gain here is that just using the system is in effect creating a test - we don’t need a separate scripting tool to record/playback a session.

Random mode
As part of the above work to implement scripting, it became trivial to implement a “random” mode, where the system randomly chooses an action to take in response to any edit that is triggered. Given some input data, we can let the system randomly explore and choose any available action. Eventually (hopefully !) the data is all fixed and the session ends. (Or else it may have crashed along the way, which is a good result because now we definitely know there is a bug.) We can then check the output data for correctness - we have some automatic validations that can be applied, or we can manually inspect the data.

This lets many more paths through the system be explored than we have time to do manually. It still remains a fairly manual and laborious task to check the output data if the automatic validations don’t find any issues. If we do find issues, we can then fix the system (and add more auto validations).

Test data generation
I think this deserves a topic on its own ! - but will briefly describe an approach we are trying here.
Because our business logic is specified in a DSL (domain specific language) for which we have tools to parse and generate code from, we can use our business logic as constraints which we may (or may not) want our test data to comply with. For example, if a trigger for an edit was something like “person’s age is under 10 but they are married” (just a made up example) then we can use this to automatically generate data which definitely triggers the edit, eg by assigning random ages and if the age is under 10 then make them married. Alternatively we can do the opposite and generate data which will not trigger the edit. Or, generate different persons with ages 9, 10, 11, etc. Of course, this is just one particular condition - and we have 100s or 1000s or conditions which relate to each other. So in the end we potentially want to apply all these constraints to end up with data which satisfies certain conditions, and breaks others.

This could lead to us slightly extending our existing DSL to allow easier specification of test data to be generated.

I know I haven’t explained this very well here - I could go into more detail and give a better explanation if anyone is interested, in a separate post. But, I did want to include this here because it’s a bit of a different example of using existing functionality to help test itself, ie taking the business logic formally defined in a DSL, plus the tools to work with that DSL, and repurposing that to generate test data conforming (or not) to those rules.

Anyway, that’s a few examples of some of the things we have done which have been useful for us. Finding these sorts of opportunities can depend very much on the individual system - so I’m definitely not saying that these examples apply to anyone else’s systems. Just from my own experience though I’ve found it worthwhile to look for and make these opportunities - because if you can find them, they can be invaluable. Of course, on the other hand, if it simply is too much work to include appropriate self-testing functionality, then the cost of doing so might not be worth it.

We could probably come up with a list of general techniques, eg
Round tripping - where some functionality is effectively the reverse of another, and after applying both we should end up with the original (in some form)
Idempotency - we should expect the same result however many times we apply the operation. So regardless of knowing what the result should actually be, if we know that it should be the same we can check for that

Am keen to hear what other people have done in this kind of area :slight_smile:
Or, if you have some particular situation that you need to test, maybe people could come up with suggestions of how to make it self-testing ?

Thanks Glenn. Very interesting In Employment we are definitely delving in data generation, both to fuel auto tests but also to fuel manual tests.

However we haven’t done much ‘self testing’. This is an area to explore more. However I am having some trouble envisioning your sample system - is it more of a batch or background process? So the process executes and you then immediately trigger a check to see if the process succeeded?

Also, how does it ‘record’ itself continuously?

Hi Cam,

I probably didn’t explain it in enough detail.
We break the data up into batches small enough for individual users to process. So, imagine if a batch consisted of 1000 person records. The system would then step through each person record, and apply all the rules to each person, then move on to the next person. When it comes to a person where one of the rules fails, ie the edit triggers, then it may show a screen to the user to allow them to fix the problem with that person. For example if the trigger is that the person’s age is under 15, and yet their income is over $100,000, then when the system gets to a person aged 14 with income $150,000, the edit triggers, and the screen is shown. The user will look a the relevant information, (which might include images of the form that collected the data) and then might choose the action to modify their age to 74 (because maybe their handwriting was bad, and 74 was OCR’ed to 14). The edit is then checked again, and won’t trigger because the data is now good, so the system moves on to applying further edits, and then to the next record.

Alternatively, some edits may be resolved automatically, so the fix will be applied, and the user sitting there will be none the wiser that anything has happened.

So it is basically an online system, not really batch. But, if no errors requiring human intervention are detected, then the user might be sitting there for a few minutes, with nothing to do, just watching the progress bar.

Hope that’s is a little clearer ? If not, let me know.

As for recording itself, it simply logs what happened in terms of which edits triggered on which record, and which actions were applied. So in the case above, the information recorded is the keys to identify the person record, the ID of the edit, the ID of the action, and the new age entered by the user - ie, enough detail to replicate what happened. When replaying the system on the original data, the same edit as before will trigger, and the system reads from the script file to see what it should do in response to that.

Again, if that’s not clear enough, let me know.

Hi again @glenn.roberts,

Is this a web based online system? What tool do you use for recording/logging? Or is bespoke? Isn’t that a lot of recorded data? Do you archive or cull periodically?

You must have a pretty good map of your edits to a list of business rules?

No, it’s not web based, it’s a desktop system, just a C# program .exe.
The logging and script recording is just some bespoke code - it’s pretty simple, just writing a few things to a text file.

Basically, after processing, every “workload” ends up with a matching script file which records what happened. Maybe it is a lot of data ?.. but we know that we are going to be working with massive amounts of data, so we make sure we provision enough disk space as required. Eventually we archived stuff, but for the most part I think we didn’t bother - we had enough space.

“You must have a pretty good map of your edits to a list of business rules?”
Well…this is the beauty of the DSL approach ! :slight_smile:
They are basically one and the same thing. The business clients define their business rules as edits using our domain specific language (which of course as the name suggests is purpose-built for our specific business domain).

So instead of writing out their requirements (ie business rules) in English and getting the developers to build a system to implement those rules, they write the rules in the DSL, and then the actual system (C# code) is generated automatically (more automation !)

So yes, the “mapping” of business rules to edits is very clear and direct.

What DSL do you use?

It’s our own in-house developed language, tailored specifically to provide the capability that is needed to process our data. It’s aimed at our business users (who aren’t programmers) so it needs to be as close to their business domain as possible, and as simple as possible, without “incidental complexity” that we as programmers generally have to deal with. Things which are well understood and agreed upon (and which are unlikely to change) are abstracted away from them and automatically taken care of, leaving them to deal with only the “important” things that actually make up their business rules etc.

To make an analogy, imagine if you were designing a language to define user screens (windows). Do we need to include the ability to specify whether the screen is rectangular, or circular, etc ? Probably not, because in systems we currently deal with, screens are rectangular, never circular, so there’s no point in adding that complexity to the language. If the need ever arises for circular windows, then think about adding that complexity in then.

To implement our language, we hand crafted the parser and code generation (in C#), but we are now looking at using some tooling to do a lot of that work for us, eg check out Xtext which is an Eclipse plugin which allows you to build DSLs and the tooling to use them, eg editors, code generation etc

I love this concept. Was it a lot of effort to cut the DSL together? The Eclipse tool makes it sound like that may be an easier path to take even for .Net - https://blogs.itemis.com/en/language-development-on-.net-with-xtext-part-1-overview

Still it would take a dedicate push and dedicated resources to get this together from scratch in a timely manner, I’ll bet.

Could you apply this to a more general back end API/Service style processing set up do you think? Or is it something that lends itself better to more targeted processes?

How hard was it to get your BAs to start using it? Have you seen benefits on the Tester side too?

Yes, I think the DSL concept is really useful, as long as you have a subject matter domain that lends itself to this kind of approach. These days I’m tending to have this as my default approach, and then if it doesn’t look as though it will work, try something else.

But it really is just the usual thing we do in software development, but perhaps taken a bit further than usual - ie we do have to understand and analyse our subject matter, and we’ll typically come up with classes/APIs/frameworks/etc to represent and encapsulate that domain. DSLs take this a step further and provide all this in the actual language which both developers and business people can hopefully read, understand and write.

I think one of the really good things about it is that it really makes you think about your domain. You really need to understand it, to be able to come up with a language which works well.

Was it a lot of effort ? Hmm… I’d say a moderate amount at the time, which was now about 15 years ago. Adding to the amount of work was that the translator from our DSL to C# was totally written by hand. But, reducing the amount of work was that we haven’t really provided a good authoring experience - people just write their specifications in an editor with syntax highlighting.

This is where we’re thinking Xtext can be of real value - generate the parser automatically, and also offer a much better authoring experience. So, cutting down the amount of work, but also offering more functionality.

From the beginning I wouldn’t say we had specific BAs - really just the business users. But yes, they were on board from the beginning, and over the years have loved it. It gives them great freedom to specify (and automatically build !) the systems they want, without having to wait for the developers to build it for them. Definitely though, it can depend on the people. Some people just cannot think logically at the level of detail that is sometimes required. However, I don’t think this means that you can’t use this approach at all - for example you can still have the developers writing it, and the business users just reading it.

In terms of testing, we’re hoping to gain some benefit from it by using the business rules written in the DSL to automatically generate test data. In fact, we may extend the DSL to make this specification of test data more simple.

The business people also end up testing the systems they specify and generate as well - in this particular case, we haven’t had dedicated testers involved.

all in all, I thoroughly recommend giving it a try, whatever your business domain, and see if it can work for you.