Scale your thinking for better testing

I went to an absolutely fascinating talk a while ago, when we could go to talks. To be honest it was a very strange corporate event, but it was local and a friend was speaking. A particular talk caught my eyes and ears. An Ops Engineer had carved out a new path, load testing to forecast cloud spend. Many clients were being burned by exponential costs as their business grew. Apparently, most cloud cost calculators provided by vendors are broken nonsense. Full of marketing bias and false assumptions.

This really struck me. If we as testers are looking to create value, this was a great example of addressing a huge business risk using testing. I found myself asking lots of questions and discussing the example further with the speaker afterwards. Then it struck me again. I hear a lot about testers becoming automation experts, accessibility gurus or white hat hackers. All very worthy but the performance and load tester option is still a rarely trodden career path and a great proponent is hard to find. It starts with scaling your thinking. Rather than poking at a system in a test environment with one webserver and saying everything looks fine. What about when there are a hundred load balanced webservers? Message queues with multiple partitions with elected leaders in a cluster? You get the picture.

There are many compelling reasons to make this mindset shift, that can really set you apart when it comes to testing:

  • Systems at scale – the first time I had to test something at scale, like millions of concurrent users type scale it changed the way I looked at testing. The reality was that on this scale pass and fail doesn’t cut it. You have to learn to think ranges and tolerances, not binary. If you have millions of users, for someone, somewhere your application will be slow, broken and annoying. Plus, you know that hard to reproduce bug? If you can find it, then thousands of customers will find it too. You might get more help pin-pointing it by mentioning this in passing. When you model for load or performance testing, you appreciate the boundaries of your testing more acutely. Done doesn’t really mean done until you can do it for the number of users you need, at the speed you need it.
  • Valuable automation – as mentioned above, test automation is big and most job adverts talk about it, we talk about it continuously. A targeted, fast set of automated acceptance tests are an asset to any pipeline. But you need the pipeline and the ability to deploy an application that is ready to serve requests in a safe manner. For the most part, test automation at an application level is a pain without the ability to deploy. If you aim to build in some form of early load testing, you will start to build a picture of what you need to deploy safely and start a meaningful test. Instance sizes, load balancing, base data, health checks, smoke tests. All the stuff you need for a great deployment pipeline, all the way to Production.
  • Early load testing – load testing? You don’t need to do that yet. I have heard that from an array of Project Managers, Scrum Masters even Developers on projects in various levels of chaos. If you bring forward at least some of your load testing, your project will thank you for it. Fight for it. Here’s the secret. Load testing isn’t really load testing. Its meeting your wider project and product goals testing, certainly from a technical standpoint. If its too late and the architecture is too firmly set, you might find one of those issues that give me a tingle of both excitement and fear. The architecture killer. If this is the day before a release, you will be popular.
  • Earlier concurrency problems – You will find all those lovely, early concurrency bugs that happen once you have built a component or two. As soon you can string a HTTP API and database together give it a go with a little more than checking the connection. You might get a nice surprise and a very valuable bit of information about code, configuration or both. From painful and joyful experience, definitely do this for stream processing technology. I’m talking to you every Kafka implementation I’ve ever done.
  • Blast radius – A major version upgrade of php. A new database driver. Data migration. Basically anytime you ask for guidance for what to test and the answer is ‘everything.’ If you have expanded your performance and load testing skills, you have a powerful wide area testing ability. A tool that is much more appropriate to the job at hand, when coupled with exploratory testing and effective test automation. Add this to a sensible, gradual release strategy with decent monitoring. Scary ‘test everything’ changes are a lot less terrifying.
  • Balanced test approaches – all test automation, all exploratory testing and mindmaps, all one way or the other. Load testing skills and tooling mean you can inject randomness at scale into your system and see what happens. This is something so often missing from testing strategies. The world is often obsessed with determinism when it comes to testing. Soaking a system in randomness will tell you something different. Your bias, product and domain knowledge mean your exploratory testing won’t generate enough serendipity. Sorry.

Just being able to do performance and load testing is a great indicator of quality. Performance and load testing is often at its most frustrating when still sweeping out bugs. Or still trying to deploy safely, clear out databases, wobbly operability and many more. This is a sign that you shouldn’t ignore.

My advice on getting started:

  • DO follow and investigate the works of those who have worked in large scale environments. Charity Majors, Liz Fong Jones and Cindy Sridharan are excellent examples. Working at scale gives a unique perspective to long held principles about software development. Usually that all is not as it seems. You may not work for Google, but taking how you test and extrapolating that for thousands of customers can change the way you look at a problem. This is especially true for deploy and release strategies. A surprising amount of organisations still look to mitigate their risk of release with testing in non representative test environments. Its insane and testers go along with it, even taking the heat for it. Dark launches, feature flags, A/B testing, learn it from those who use it and encourage it.
  • DO look at various works of Scott Barber. This is foundational stuff. Heuristics and mnemonics particularly on risk and modelling load, which is the value add. Plus it is an extension of a testers feel for risks but at a product or project level, rather than on a feature by feature basis. I have always been a massive fan of the FIBLOTS mnemonic for many years and used it in client and conference workshops to great effect.
  • DO listen the PerfBytes podcast, particularly the ‘News of the Damned’ version. It shows how common issues around performance and load really are and how nuanced a world it is. In fact, it can be cool to say our site collapsed under load when selling tickets to a festival. The only thing worse than being talked about is not being talked about after all. These examples though speak to the nuance of load testing as part of the wider testing craft, with humans at the centre of it all. Like a bug in a feature a bug in the capacity or performance of the system still provokes an emotional reaction. Maybe even more as they are often less tangible, harder to pinpoint and have many interrelated causes. I have seen people get very mad about them.
  • DO start small with one of the strangely military themed cli based load testing tools. Like Artillery and Barrage for example. These wonderful tools are great at finding those (seemingly) obvious concurrency bugs. Ones that appear when you try 5 or 10 of the same thing at the same time for the first time. Can even be reasonably run against a local environment, very little cost to you and no massive setup needed.
  • DO advocate for load testing libraries rather than tools where possible. I was on a team of Python developers so I used Locust. Using JMeter would have placed a language and a user interface barrier in the way of developer engagement. Thats two barriers too many. Once your developers are engaged, this opens the door to really evolving your framework and the scenarios you want to perform. Plus, more importantly, scenarios that target areas developers are worried about.
  • DON’T start with (the book) Site Reliability Engineering. Seriously dense stuff and will wear your brain out. Build up to that one. There is more to SRE than performance and load testing after all, but a lot of it does dovetail nicely together. I actually preferred the SRE workbook, much more practical and parsable for my average brain. Maybe that was the wrong way around, but it worked for me.
  • DON’T get into terminology battles. As usual with a testing blog, a word on terminology. Avoid the debate around non-functional/para-functional/paranormal/whatever testing. Lumping these areas together is just not helpful. Talk about what risk you want to test for. Long running java process? Worried about garbage collection? Hell yes you want to do a long running test at load that is similar to production. Say that, its easier to understand. Maybe later, call it a soak test or something that you all understand.
  • DON’T beat yourself up if you don’t understand much at first. It’s a deep area of study. This is general advice to be fair, but always important.

In the end, literally no one has said and meant that a system is fine when its painfully slow and falls over when two people try and do the same thing at once. Paraphrasing the context driven testing principles, If it doesn’t work at scale then it just plain doesn’t work. Scale your thinking to avoid falling into this trap…