30 Days of AI in Testing – Days 26-30

The final post of 30 Days of AI in Testing offered by the excellent Ministry of Testing.

https://www.ministryoftesting.com/events/30-days-of-ai-in-testing

This blog is a record of Days 26 to 30, rather than having them just on the Club. Its quite a lot of writing and I like to have my writing in one place where possible.

Day 26

The Carbon Footprint of AI

This always reminds me of the paperclip game, where you end up consuming all resources in the universe to make paperclips. When we create a new technology (like Bitcoin mining for example), we end up firing up old coal power stations to feed the hunger. I assume Generative AI will be the same, where as models become more energy efficient, we will just ask them to do more and more. Perhaps a large language model will eventually come up with a plan to address climate change that everyone will be onboard with.

I’ll have a look at this today I think, try and list some strategies for reducing that footprint:

https://www.forbes.com/sites/bernardmarr/2023/03/22/green-intelligence-why-data-and-ai-must-become-more-sustainable/?sh=5e6a6a27658c

  • Interesting that they compare training a model to the lifetime output of five cars. We could do with a lot less cars to be fair before worrying about AI’s carbon footprint.
  • I like some of the ideas about where to do your processing, looking for areas that have a lot of hydroelectric power generation for example.
  • I’m wary of calculators provided by cloud providers – if they are as accurate as the cost calculators, then we are in environmental trouble. However, they might serve as a starting point.
  • The 4M best practices are a good guide to bring the other points in the article together, although some who are large enough are returning to on prem, as the cloud cost gets higher and higher.

In terms of AI in testing, it would be lovely to have a AI capability which:

  • Powers down unused infrastructure when its not in use.
  • Limits test environments to specific purposes (or types of testing), rather than spinning them up because the previous test strategy said so.
  • Runs tests when it makes sense to do so, overnight for longer test runs for example.
  • Help to filter our automation for repetition and needless tests that haven’t failed for eons.
  • Models which ask for better prompts, instead of just responding. 🙂

Day 27

Assess Current State and Identify Capability Gaps

Lets start with the current state:

  • I think we would need to do more to find out where our current testing pain points are to be honest, before even beginning on the AI in testing path. Perhaps we could start with:
    • Identifying scenarios, hidden requirements and error handling needs earlier and during story kick off.
    • Selecting unit and widget tests during development.
    • Exploratory testing note taking and guidance.
    • Identify and add integration (in Flutter tooling terms basically end to end) test candidates.
  • The team is fairly ignorant of the current capabilities of how AI in testing could be used, so some education and research time required.
  • The team shows resistance to using a Generative AI within the product, dismissing it as untrustworthy. Education on using models trained on data of our choosing seems important.
  • The team is heavily wedded to AWS, so a good place to start might be tooling in that area, as there is less friction. Maybe a spike type activity with the AWS Bedrock set of tools.

Develop a Roadmap

I’ve always rather liked the Now/Next/Later format for roadmaps, where the now is small, next is a little bigger and later is a park for the bigger less certain ideas.

  • Now – spike for AWS Bedrock capabilities, followed by a hack day type exercise for adding a Generative AI to the product. Would use something like LangChain and Smith to build as it has good testing and observability capabilities. Basically try and get the team excited by AI.
  • Next – Investigate how to select unit and widget testing scenarios with a private model, possibly using our codebase as training, plus best practices for low level testing in Flutter and beyond. I say this because Gen AI is good at structured tasks, less so with exploratory tasks.
  • Later – train our own model using data from Github and Jira about the product, including testing notes. Use that to generate test ideas, while training/creating guidance on prompt engineering. This is to aid with exploratory testing.

This might all change after assessing current testing capabilities, but is a starter.

Day 28

Which walkthrough you choose and why

I chose walkthrough 3, as Retrieval Augmented Generation addresses the two main challenges beyond privacy, the total confidence that an LLM gives an incorrect answer and keeping your model up to date as your testing progresses.

This video was a great plain language explainer too I think, could be shared with stakeholders:

https://research.ibm.com/blog/retrieval-augmented-generation-RAG

How well you think this approach addresses your concerns about data privacy and/or context awareness.

I fed the model with 4 testability PDF’s I created about the smells of hard to test architecture. I changed the prompt to:

Summarise the testing smells that show your architecture has poor testability

From what I can see from the output (I didn’t use the interface):

RAG PROMPT - Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

There was an explicit instruction not to make up an answer!

It used two of the PDF’s to come up with its answer, which was correct, the other two didn’t reference smells:

Helpful Answer: The testing smells that show your architecture has poor testability include: 
(1) Too many production issues, 
(2) Pre-release regression cycles, 
(3) Lack of automation and exploratory testing, (4) Hesitance to change code, 
(5) Testing not considered during architectural design, 
(6) Team seeking more testers, 
(7) Too many slow UI tests, 
(8) Important scenarios not tested, 
(9) Ineffective unit and integration tests, 
(10) Cluttered ineffective logging, 
(11) Flaky nondeterministic automation, 
(12) Tests with duplication and irrelevant detail, 
(13) Issues difficult to reproduce and isolate. 

These smells impact the team's ability to deliver value and cause team satisfaction to decrease.

I think if this model was deployed on a ring fenced server with no internet access and a means for the team to add more documents for context it would go a long way to addressing the privacy problem.

What opportunities does the approach provide you and your team?

Lots of testers share their exploratory testing notes on their team wiki, I think it would be an awesome resource for the whole team to access it as a model for test idea and design inspiration. Or even link it to your test management tool. Jira, wiki’s and test management tools have a big flaw in that they are information hiding tools, so anything that turns this around, is a good thing.

Also looks like I should take the hugging face course as well. 🙂

https://huggingface.co/learn/nlp-course/chapter1/1

Day 29

I have a few inspirations.

I remember, a few years ago seeing this talk by Bill Matthews:

https://www.ministryoftesting.com/testbash-sessions/smart-algorithms-are-we-ready-for-this-bill-matthews

I’ve always liked Bill and he has the knack of looking into the future a bit more and spending time on technologies that might be classed as ‘speculative’ at the time, but we know differently now!

From Day 4 of this challenge we have Carlos Kidman:

https://www.ministryoftesting.com/testbash-sessions/ask-me-anything-artificial-intelligence-in-testing

Knowledgeable and calm about AI (will go and see rather than being cynical, like a good tester) I thought this was such a great start to the challenge.

This TEDx Talk from Tanja Vos was also great in terms of future challenges:

A special mention to Mark Winteringhams AI Assisted Testing book, which is a great first go at creating an approach:

https://www.manning.com/books/ai-assisted-testing

In this challenge, I found everyone who had a go at an AI in Testing Manifesto and think there is some wisdom in the crowd there:

https://club.ministryoftesting.com/t/day-21-develop-your-ai-in-testing-manifesto/75315/11

Day 30

In my mind, this is tied to contemporary exploratory testing championed by Maaret Pyhäjärvi. We should use AI to create assistants that maximise what great testing looks like over squeezing efficiencies out of what we already do.

Concepts such as:

  • The Automation Gambit – creating an executable specification encouraging learning of how to test, be resourceful and document all at once.
  • Parameterising unit tests to maximise their value and find new information.
  • Unattended testing – generating data for a report for example, automating the data creation, changing report generation parameters and checking the output.
  • Attended testing – using logs/events/metrics to augment your exploratory testing.
  • Bug fixing – fixing bugs together with developers or fixing them yourself

Envision the Perfect Assistant

I would call it Zenko, meaning a benevolent and clever fox from Japanese folklore, that grows wiser with experience. Also, you can have a cool logo of a fox with lots of tails. Perhaps, as you learn together, more tails can be added.

With the above in mind, I would like the assistent to:

  • As one tests, build the executable specification as shown above, but be able to access the current specification as it is to suggest coverage improvements or warn about duplication.
  • Suggesting where a lower level (unit, component, integration test) covers what you are testing, or how one might expand that coverage.
  • Expose the configuration/environment variables of a system and be able to change on the fly, maybe best when testing against a locally running application.
  • Consume logs as part of the model and display snippets of them if you describe something as a bug in your notes or indicate a problem/question.
  • Suggest areas of the code where the fix for a problem might be so you can better point developers in the right location (or for you to try fix it yourself).

I think the main thing for such a tool is to look to how we can make testing better and more teachable, rather than trying to hide or automate the skilful part.

I would be one of those frustrating product owners I think with a cool idea but a very loose grasp on how to make it a reality!