While we continue to develop Tester H, it’s worth keeping some of the platform’s known limitations and issues in mind. Avoiding these or working around them can optimize your usage of the platform. Please notify us of any other pain points or issues you may come across.

Step limits

A single action block, for example “When I log in using this <username> and this <password>” is limited to 30 steps. However, an action with 30 relevant steps is neither likely not recommended as this can overly complicate the scenario and compromise the quality of the test suite itself.

Access to websites behind a proxy

Tester H operates by directly accessing websites via standard web protocols. This means that Tester H cannot access or process any website or web resource that is not publicly available on the internet.

Providing TesterH with unreachable URLs (e.g., internal network addresses, sites behind firewalls without proper access configuration, or non-existent domains) will always result in a test failure.

To ensure successful test execution, please verify that all URLs provided to Tester H are publicly accessible and correctly configured for external access.

Final state verification

Tester H is designed solely to verify the final visible state of a webpage. It cannot perform any actions (like clicking, typing, or scrolling) at this final stage of evaluation.

This means that any condition or outcome you ask Tester H to check must be visually obvious and completely present on the final landing page it’s testing. If the expected state requires an interaction to be revealed, or if it’s on a subsequent page, Tester H will report a failure.

Also, prompt instructions based on brief or quick visual changes will likely not work. For example, “a pop-up appears for two seconds” cannot be reliably verified, as Tester H captures a static final state, not a live interaction.

To ensure successful evaluation by Tester H, make sure your prompt instructions apply to webpage elements that are immediately and fully visible on the final page of your test flow.

Non-idempotent actions in prompts

A “non-idempotent” prompt is one that, if run multiple times, will not produce the same result after the first execution and may lead to a failure.

For example, if your prompt instructs an agent to:

  • Log in: The first time it runs, it logs in. Rerunning it might cause a failure if the system detects an already active session or expects specific state changes upon login that are only valid once.
  • Add a user to a group: The first time it runs, the user is added. Rerunning it will likely fail because the user is already a member of that group.

Tester H (or any agent) expects a consistent environment. If an action within a test can’t be safely repeated without altering the expected outcome or causing an error, it’s considered non-idempotent in this context. This makes testing and debugging unreliable.

To ensure robust and repeatable tests, design your prompts and the underlying test environment to be as idempotent as possible. This often means ensuring that actions can be repeated without unintended side effects or failures.

Complex interactions

Tester H performs less well when simulating more complicated or scroll-heavy interactions, such as interactive maps with layered scrolling or E-commerce product lists with many filters.