30 Days of AI in Testing – Days 6-10

Currently doing the 30 Days of AI in Testing offered by the excellent Ministry of Testing.

https://www.ministryoftesting.com/events/30-days-of-ai-in-testing

This blog is a record of Days 6 to 10, rather than having them just on the Club. Its quite a lot of writing and I like to have my writing in one place where possible.

Day 6

Select a Testing Need: Choose one application of AI in testing that meets a testing need you’re interested in (e.g. test case generation, test data management, etc).

  • Analysing test automation results to provide information on how effective your tests are, which do/don’t fail often, take a long time to run, cover areas that actually change. Deep analysis if a human was do to it, taking a long time.

Research and Analyse AI Testing Tools: Next, research three or more AI testing tools that use AI to address your identified testing need. Create a list of several tools, make pertinent notes and compare them on requirements and features that matter to you.

  • Report Portal – https://reportportal.io/:
    • Can provide unified test reporting for different levels of tests across systems
    • Can be integrated with CI/CD pipelines
    • Auto analysis to try and pinpoint whether its a system bug, problem with automation, test data environments etc.
    • Can detect recurring or unique errors and classify them for you.
    • Logs, screenshots, video recordings and network traffic in test reports.
    • Has a demo project that you can go and explore.
  • Applitools Test Insights – https://applitools.com/analyze/test-insights/
    • Single dashboard as part of Applitools Eyes.
    • Used to manage your tests within that product.
    • Wouldn’t be able to integrate results of all tests for my system (unit, integration, API) and see trends there.
    • Does integrate with CI/CD via an API.
    • Its not really clear how AI assists though, unless I missed it.
  • Webomates – https://www.webomates.com/blog/automation-testing/smart-insights-swift-releases-harnessing-ai-in-test-automation-reporting/
    • Interesting named webomates talks about AiHealing, which uses test execution data and build release notes to detect changes.
    • Where it uses your test automation reports to do one of the following:
      • Heals automated tests – locators, timeout changes.
      • Feature modified – picks out the tests that needs to change and regenerates them
      • New tests needed – can be reviewed and added
    • I found this tool slightly confusing, the wording of test, case, script.
    • Does integrate with CI/CD via an API.

Report Portal went into way more detail and had a test instance to poke around in, I am favouring that tool initially.

Day 7

Research Prompt Engineering: Conduct some research on effective prompt engineering techniques.

From OpenAI

  • Use the latest model, they tend to be much more capable. Usually paid for as well.
  • Seperate the instruction and the context to make it clearer for the model.
  • Be specific about outcome, length, format and style.
  • Articulate the desired output format through examples, like as a table or list or whatever you need.
  • Start with zero shot, the few shot and then fine tune. Took me a while to get my head around this one, its like examples. Give it a few examples to get warmed up then your target text.
  • Reduce fluffy and imprecise descriptions.
  • Instead of saying what not to do, say what to do instead.
  • Using leading words to get the model started. If trying to generate python code, add your prompt followed by import will improve the output.

Some of these seem intuitive and pertain to writing in general I think. Many sites replicated these in subtle ways.

I would add specifying a role too. I want you to be a marketing expert or solutions architect or whatever.

Share Your Findings: Share 2-3 prompt engineering techniques you found that seem relevant, useful or new to you in reply to this topic. Feel free to link to any helpful resources you found as well.

I liked this site best I think: https://www.promptingguide.ai/introduction/elements

My brain liked how they broke it down:

  • Instruction – a specific task or instruction you want the model to perform
  • Context – external information or additional context that can steer the model to better responses
  • Input Data – the input or question that we are interested to find a response for
  • Output Indicator – the type or format of the output.

I would add the role you want the model to play into the context or have its own section. I will be using similar structures in future I think.

I also liked the idea of Constitutional AI: https://arxiv.org/abs/2310.13798

Providing your prompt then a set of principles with which the GenerativeAI judges the outcome of the prompt. You might ask for options for how to deal with a situation but want to apply certain principles to it and get the model to filter its responses through them.

Day 8

Beginner Level

Generate Basic Test Scenarios: Create a prompt that generates a test scenarios for a common requirement, such as signing up for an online platform like the Ministry of Testing (MoT). Focus on crafting a prompt that makes the LLM create a story-like scenario.

  • Instruction – please can you create test scenarios for adding and removing items to a shopping cart on an ecommerce website.
  • Context – The site is for the British market only, sells paper books and no other products. You can buy multiple of each, hardbacks and paperbacks. Once the shopping cart session is done, you can then checkout and pay.
  • Role – I want you to respond as a software testing professional.
  • Output Indicator – I would like the output format to be a narrative, so I can describe it the Product Owner who is invested in the quality of the product and wants to know what would be tested.

Claude said:

*clears throat* As a seasoned software testing professional, I understand the importance of thoroughly testing the shopping cart functionality for an e-commerce website. In this case, we're dealing with a British online bookstore that sells paper books, both hardbacks and paperbacks, with the ability to purchase multiple copies of each title.

To ensure a seamless shopping experience, we'll need to cover various scenarios related to adding and removing items from the shopping cart. Here's how I would approach this:

First and foremost, we'll test the basic functionality of adding a single book to the cart. This includes verifying that the correct book details (title, author, price, etc.) are displayed in the cart, and that the total cost is calculated accurately. We'll also check that the cart reflects the correct quantity when multiple copies of the same book are added.

Next, we'll explore scenarios involving the addition of multiple books to the cart. This could include combining hardbacks and paperbacks, adding books from different categories or publishers, and ensuring that the cart correctly calculates the total cost based on the prices of the individual items.

Since the site caters to the British market, we'll need to test scenarios related to currency formatting and localization. For instance, we'll verify that prices are displayed in British pounds (£) and that any relevant taxes or fees are correctly applied based on the user's location within the UK.

Removing items from the cart is equally important, so we'll test various scenarios such as removing a single book, multiple copies of the same book, or clearing the entire cart. We'll also check that the total cost is updated accordingly after each removal.

Additionally, we'll test edge cases and boundary conditions, such as attempting to add an invalid or non-existent book to the cart, adding an excessive number of copies beyond the available stock, or trying to checkout with an empty cart.

Throughout the testing process, we'll also pay attention to the user experience aspects of the shopping cart. This includes verifying that the cart is easily accessible from different pages on the website, that clear visual indicators are present when items are added or removed, and that any error messages or notifications are informative and user-friendly.

By thoroughly testing these scenarios, we can ensure that the shopping cart functionality meets the expectations of the British book-buying audience and provides a smooth and reliable experience for customers. Of course, this is just a high-level overview, and we may need to adapt our testing approach based on specific requirements or additional features of the website.

Claude said of my prompt:

Your initial prompt was well-structured and provided the necessary context and instructions for me to formulate an appropriate response. Here's a breakdown of what worked well:

1. Clear instructions: You specified that you wanted me to create test scenarios for adding and removing items to a shopping cart on an e-commerce website.

2. Relevant context: You provided important details about the website, such as it being for the British market, selling only paper books (hardbacks and paperbacks), and allowing users to purchase multiple copies of each book.

3. Defined role: You explicitly stated that you wanted me to respond as a software testing professional, which helped me tailor my language and approach accordingly.

4. Output format: You specified that you wanted the output in a narrative format, suitable for describing the test scenarios to a Product Owner interested in the quality of the product.

By including these elements in your prompt, you effectively set the stage for me to understand the task at hand, the relevant background information, and the expected tone and structure of the response.

One potential area for improvement could be to provide additional details or requirements, if available. For example, you could specify if there are any particular areas of focus (e.g., usability, performance, security), or if there are any specific features or functionalities within the shopping cart that need special attention.

Overall, your prompt was clear, concise, and provided the necessary information for me to generate a relevant and appropriate response for the given context.

I like the feedback to add usuability, performance and security needs too.

Format-Specific Test Scenarios: Build on the previous task by specifying the output format. This could be Behavior Driven Development (BDD) syntax or a CSV file tailored for upload into a test management tool. See how the format changes the usefulness and clarity of the scenario.

  • Instruction – for the bookseller ecommerce site, we also want to automate certain tests. This will be done using Cucumber and Selenium, so we need to create scenarios in the Gherkin format.
  • Context – we want to cover the ‘main’ scenarios, not all of the edge cases. Single and multiple book purchases particularly.
  • Role – as a tester who specialises in test automation and want to run tests against new builds
  • Output Indicator – the type or format of the output.

Feature: Account Holder withdraws cash

Scenario: Account has sufficient funds
Given the account balance is $100
And the card is valid
And the machine contains enough money
When the Account Holder requests $20
Then the ATM should dispense $20
And the account balance should be $80
And the card should be returned

Claude said:

*clears throat* As a skilled test automation engineer specializing in running tests against new builds, I understand the importance of creating well-structured scenarios in the Gherkin format using Cucumber and Selenium. For the bookseller e-commerce site, let's focus on the main scenarios, particularly single and multiple book purchases.

gherkin

Test scenarios for single book purchase

Feature: Single Book Purchase

Scenario: Add a single book to the cart
Given the user is on the book details page
When the user clicks the “Add to Cart” button
Then the book should be added to the cart
And the cart count should be updated to 1

Scenario: Proceed to checkout with a single book
Given the user has a book in the cart
When the user proceeds to checkout
Then the checkout page should display the book details
And the total cost should be calculated correctly

Scenario: Complete checkout for a single book purchase
Given the user is on the checkout page with a single book
When the user enters valid payment and shipping information
And submits the order
Then the order should be placed successfully
And a confirmation page should be displayed

Test scenarios for multiple book purchases

Feature: Multiple Book Purchase

Scenario: Add multiple books to the cart
Given the user is on the book listing page
When the user adds multiple books to the cart
Then all books should be added to the cart
And the cart count should reflect the total number of books

Scenario: Update quantities in the cart
Given the user has multiple books in the cart
When the user updates the quantity for a book
Then the cart should reflect the updated quantity
And the total cost should be recalculated correctly

Scenario: Complete checkout for multiple book purchase
Given the user is on the checkout page with multiple books
When the user enters valid payment and shipping information
And submits the order
Then the order should be placed successfully
And a confirmation page should be displayed with all ordered books

These scenarios written in the Gherkin format using the Given-When-Then structure will help us automate the testing of the core functionality for single and multiple book purchases on the bookseller e-commerce site using Cucumber and Selenium. The scenarios cover adding books to the cart, updating quantities, proceeding to checkout, and completing the purchase process.

Please note that while these scenarios cover the main functionality, we may need to create additional scenarios to test edge cases, error handling, and other specific requirements based on the project’s needs.

Quite liked the split, the tests are long enough to be meaningful but not too long that they might be brittle. They build into each other well, describing journey to the cart and to completing.

Claude said of my prompt:

Certainly! Your second prompt was well-crafted and provided clear instructions for the task at hand. Let me break it down:

Instruction
The instruction was straightforward – to create scenarios in the Gherkin format using Cucumber and Selenium for automating tests on the bookseller e-commerce site.

Context
You provided relevant context by specifying that we should cover the “main” scenarios, focusing on single and multiple book purchases, rather than all the edge cases.

Role
The role you assigned was that of a tester who specializes in test automation and wants to run tests against new builds. This helped me adopt the appropriate mindset and language for generating the scenarios.

Output Format
While you didn’t explicitly state the desired output format, the instruction to create scenarios in the Gherkin format implied that the output should follow the standard Gherkin syntax with Features, Scenarios, Given-When-Then steps.

One area where the prompt could have been slightly more detailed is in specifying any additional requirements or constraints for the scenarios. For example, you could have provided guidelines on the desired level of granularity, whether to include specific test data, or any particular areas of focus within the purchase flow.

Overall, however, your prompt was clear, concise, and provided the necessary information for me to generate relevant and well-structured test automation scenarios in the Gherkin format. The context and role helped me tailor the output to the specific needs of an automation tester working on the bookseller e-commerce site.

Effective prompt engineering is crucial for getting the most out of AI language models, and your second prompt demonstrated a good understanding of the key elements required to elicit a suitable response. Well done!

## Day 9

### Evaluate prompt quality and try to improve it

#### Research Evaluation Techniques

Research techniques for evaluating the quality of prompts and the outputs they generate. This could include criteria like clarity, specificity, bias, coherence, relevance or usefulness of the output, etc.

##### Prompts

For this I used, this site which seems comprehensive:

https://www.leewayhertz.com/prompt-engineering/

* Craft detailed and direct instructions
    * Use delimiters such as """ to separate sections
    * Request a structured output
    * Confirm whether conditions are met, ask the LLM to confirm "no data provided" for example.
    * Provide successful tasks for example.
* Allow the model time to think
    * Break down compexity into smaller steps
    * Ask the model not to jump to conclusions
* Opt for the latest model
* Provide detailed descriptions
    * For example, if requesting a poem, specify details like length, style, format
* Use example to illustrate desired output format.
* Start with zero-shot, then few-shot and finally fine tune.
* Eliminate vague and unneccesary descriptions
* Give direct instructions over prohibitions
* Use leading words for code generation.
* Provide a role

#### Choose an Approach

Select a prompt you created from yesterday’s task. Evaluate it using the criteria/techniques you discovered. Identify areas where the prompt or its output could be strengthened.

I asked:

Hi Claude. I would like to gradually refine a prompt from something simple to being a much better prompt.

Here goes. I would like to choose a tool to write some mobile test automation. Can you help?

Claude said:

That’s a good start, but the prompt is quite broad. To better assist you in choosing a tool for mobile test automation, I would need some additional context. Here are a few questions that could help refine the prompt:

  1. What type of mobile platforms are you targeting? iOS, Android, or both?
  2. Do you have a preference for a specific programming language?
  3. Are you looking for an open-source or commercial tool?
  4. What level of testing do you need to cover (functional UI testing, performance, security, etc.)?
  5. Do you need the tool to support specific testing frameworks or methodologies (e.g., Behavior-Driven Development)?
  6. Are there any specific requirements or constraints to consider (e.g., integrations with other tools, team skillsets, budget)?

By providing more details around the scope, requirements, and constraints of your mobile test automation needs, I can offer a more tailored recommendation for the most suitable tool(s). Feel free to iteratively refine the prompt with any additional relevant information.

I said:

Cool

Its a Flutter app
I would like to write the tests in Dart, then the developers can help
Open source please
Functional ui testing
No, I don't want another layer like Gherkin
I would like to run the tests in CodeMagic

Claude said:

Got it, thank you for providing those additional details. With the context that:

  1. You’re working with a Flutter app
  2. Want to write tests in Dart for collaboration with developers
  3. Prefer an open-source solution
  4. Need to cover functional UI testing
  5. Don’t require a BDD framework like Gherkin
  6. Need to run tests on CodeMagic CI/CD platform

Based on these requirements, I would recommend using the official Flutter Driver tool for your mobile test automation needs.

Flutter Driver is an integrated test package provided by the Flutter team. It allows you to write automated UI tests in Dart, the same language used for developing Flutter apps. This aligns well with your preferences for open-source, using Dart, and avoiding an additional layer like Gherkin.

Some key advantages of Flutter Driver include:

  • Native integration with Flutter framework
  • Ability to simulate user interactions like taps, scrolls, text entry
  • Support for running tests on real devices or emulators/simulators
  • Easy integration with CI/CD tools like CodeMagic

Additionally, Flutter Driver plays nicely with other testing libraries like test and mockito for unit and widget testing.

To set up Flutter Driver for your project, you can follow the official documentation: https://docs.flutter.dev/cookbook/testing/integration/introduction

This guide will walk you through installing dependencies, creating test files, and running the tests locally or on CI/CD. The Flutter community also provides many examples and resources.

Let me know if you need any other specifics or have additional requirements to consider for this mobile test automation setup.

We went back and forth. I was quite impressed with how we refined as we went. I asked Claude what a good prompt would have been:

Based on our discussion and the summary of the key points, here’s how I would structure a good prompt for this mobile test automation scenario:

Instruction: Recommend a strategy and tools for automating UI tests for a Flutter app that uses SMS OTP (One-Time Password) authentication via Amazon Cognito.

Context:

  • The app is developed using Flutter and Dart
  • Authentication is handled through Amazon Cognito User Pools
  • Users receive OTP codes via SMS for authentication
  • Automating the SMS OTP flow is challenging due to timing variability

Requirements:

  • Use open-source tools and libraries
  • Write tests in Dart for better collaboration with developers
  • Implement a solution to overcome the SMS OTP automation challenges
  • Ensure separation and security of test environment and data

Role: You are an experienced mobile test automation engineer tasked with setting up a robust and maintainable automated UI testing framework for this Flutter app.

Output: Provide a detailed strategy outlining the tools, libraries, and approaches you would use to meet the requirements, including specific recommendations for handling the SMS OTP authentication during automated tests. Explain the rationale behind your choices and any potential trade-offs or considerations.

By structuring the prompt this way, I’ve aimed to provide:

  1. Clear instructions on the task at hand (recommending a test automation strategy).
  2. Relevant context about the app, its tech stack, and the specific challenge (SMS OTP authentication).
  3. Explicit requirements that need to be addressed (open-source, Dart tests, OTP handling, test environment separation/security).
  4. A well-defined role for me to assume (experienced test automation engineer).
  5. Guidance on the expected output (detailed strategy with rationale and considerations).

This prompt encompasses all the key elements we discussed, allowing me to provide a comprehensive response tailored to the specific needs and challenges of this project. Let me know if you would modify or add anything to further improve the prompt structure.

I found it very engaging. 🙂

## Day 10

### Choose your AI Test Generation Tool

I'll stick with Claude, as I had done a lot of work with ChatGPT before.

Generate Scenarios: Experiment with how the level of detail provided to the tool impacts the quality of the generated tests.

I am a little pushed for time so I asked Claude to generate just 'main path' test for the bookstore scenario I used on Day 8.

Test scenarios for single book purchase

Feature: Single Book Purchase

Scenario: Add a single book to the cart
Given the user is on the book details page
When the user clicks the “Add to Cart” button
Then the book should be added to the cart
And the cart count should be updated to 1

Scenario: Proceed to checkout with a single book
Given the user has a book in the cart
When the user proceeds to checkout
Then the checkout page should display the book details
And the total cost should be calculated correctly

Scenario: Complete checkout for a single book purchase
Given the user is on the checkout page with a single book
When the user enters valid payment and shipping information
And submits the order
Then the order should be placed successfully
And a confirmation page should be displayed

Test scenarios for multiple book purchases

Feature: Multiple Book Purchase

Scenario: Add multiple books to the cart
Given the user is on the book listing page
When the user adds multiple books to the cart
Then all books should be added to the cart
And the cart count should reflect the total number of books

Scenario: Update quantities in the cart
Given the user has multiple books in the cart
When the user updates the quantity for a book
Then the cart should reflect the updated quantity
And the total cost should be recalculated correctly

Scenario: Complete checkout for multiple book purchase
Given the user is on the checkout page with multiple books
When the user enters valid payment and shipping information
And submits the order
Then the order should be placed successfully
And a confirmation page should be displayed with all ordered books

Review the scenarios: critically examine the scenarios generated and compare them to those that you, as a tester, might have created:

a. What did the tool do well?
Didn’t make the tests too long, in my experience humans often make these types of tests too long (many, many ‘Ands’) making it harder to pinpoint problems.
b. What did the tool not do so well with?
Update quantities test for multiple books was quite vague, could have been up or down (or empty) and the test would be quite different.
Some of the tests use non declarative language (clicks) but not egregiously.
c. How straightforward was it to express the inputs to generate the tests?
Reasonably so, give the previous prompt had asked for a narrative version of what to test. I guess it says a lot for the gradual build of prompts, rather than trying for a one shot.
d. Are the generated tests correct and ready to use?
No, I think they still need some refinement. There are a few pages that differ that need refining (book listing and book details for example).
e. How complete do you think the tests are? What does the tool miss?
More than this question, I just find it really interesting that Claude had a go with everything that was missing. The amount of assumptions made is pretty incredible. And the tests are quite convincing, until you do this exercise of course!