What if an AI agent could not only write code but also verify its functionality? QF-Test, the professional UI test automation tool from Quality First Software (QFS), provides a clear solution with its ready-to-use, staged approach to agentic testing.

From Recording to Instruction: How Test Automation Is Fundamentally Changing

Anyone who develops software knows the problem: every update, every new feature, and every code change needs to be verified. Did everything still work the way it did before? Is the application displaying what it’s supposed to display? Traditional test automation has already simplified this process considerably: a tool takes over the clicking, checks the results, and runs through everything automatically again and again — reliably, repeatably, and fast. QF-Test has been used for exactly this purpose for years, covering web and desktop applications, Java GUIs, mobile apps, and PDF documents.

But now the game is changing again. Because the same development teams that today use Claude Code, Cursor, or GitHub Copilot to write code also want to use those same agents for testing — without ever having to open the test tool. “Our customers are incredibly interested in figuring out which parts of the testing process they can hand off to AI,” explains Max Melzer, software developer and trainer at QFS. This comes as no surprise: anyone who has watched an AI agent independently refactor code will inevitably ask why that same agent can’t handle testing too.

Three Levels of Agentic Testing with QF-Test

QF-Test takes a staged approach — from initial AI-assisted checking capabilities all the way to full integration into agentic development pipelines. All three levels build on one another and are part of ongoing product development.

Level 1: AI as a Semantic Checker — Available Today

The first form of agentic testing in QF-Test is already ready for production use: AI-assisted checks for results that can’t be captured with simple pass/fail checks.

A prime example is testing integrated chatbots. Their responses are never word-for-word predictable — and yet the test still needs to verify whether the response is correct in substance. QF-Test passes the result to a language model for evaluation. Is this response plausible? Does it fall within the expected parameters? The model makes a decision, and QF-Test incorporates the result into the test run. Semantic testing instead of rigid string comparisons.

Level 2: AI Generates Test Suites — Starting with QF-Test 11

The next step is the automatic generation of entire test suites — and it’s considerably more ambitious. Instead of recording or writing tests manually, AI handles the first draft.

Based on existing test plans or Jira tickets, QF-Test imports structured requirements and generates a test suite in the correct QF-Test format. QF-Test can also explore the application independently. The AI navigates through menus and dialogs, identifies testable areas, and proposes a test plan — all without human guidance.

“It’s never the case that you can just take it and say: we’re done now,” explains Max Melzer. “But it takes a lot of work off your plate — especially at the start.” A solid first draft for experienced testers to build on. That’s the realistic value of agentic test generation today.

Level 3: QF-Test as an MCP Server — The Real Breakthrough

This is the heart of the agentic approach: with QF-Test 11, the tool gains an integrated MCP server. MCP stands for Model Context Protocol — an open standard that allows AI agents to access external tools in a standardized way.

In practice, this means: anyone using Claude Code or another AI coding agent in their development environment can give that agent access to QF-Test’s capabilities. The agent can then — without ever manually opening QF-Test — launch applications, run tests, check results, and report back errors. QF-Test becomes part of the agentic workflow: a tool the AI agent calls just like any other.

An experimental preview version of this MCP server is already available as a public preview. Early customers are already integrating it into their CI/CD pipelines and development environments.

QF-Test’s Edge in Agent-Based Testing: Breadth Beats Niche.

What sets QF-Test apart in this space isn’t MCP capability alone — specialized tools like Playwright now offer that too. The decisive difference lies in platform breadth: unlike many competitors that focus exclusively on web applications, QF-Test tests native Windows applications, Java GUIs, Android and iOS apps, and PDF documents using the same approach. Organizations running a heterogeneous application landscape in their development environment — which is more the rule than the exception in enterprises — get a unified test agent for all of it with QF-Test.

Platform	QF-Test	Playwright
Web Applications	✓	✓
Native Windows Apps	✓	–
Java GUIs (Swing, SWT, SX)	✓	–
Android & iOS	✓	–
PDF Documents	✓	–

Where Things Stand Today: What’s Available, What’s Coming

QFS has made a deliberate choice to communicate its AI features ahead of the official release. A dedicated AI Checks webinar demonstrates what’s already possible today. The generation features and the integrated MCP server will be introduced with QF-Test 11, planned for 2026.

Anyone who wants to try it out sooner can reach out to QFS directly. Preview versions are being shared selectively with interested customers and partners to gather real-world feedback from live projects.

FAQ: Agentic Testing with QF-Test

What is agentic testing?

Agentic testing is an approach to software quality assurance in which artificial intelligence (AI) agents independently plan, execute, and evaluate test tasks. Here, autonomy and reproducibility are not mutually exclusive. QF-Test takes a hybrid approach: firmly defined test building blocks form a reliable foundation, and the AI agent decides which blocks to use and in what order. This makes test runs reproducible and traceable without requiring every step to be defined in advance. While fully autonomous testing is possible with QF-Test, QFS recommends the structured hybrid approach for production environments.

How does agentic testing differ from traditional test automation?

Traditional test automation uses fixed scripts, such as step A, then step B, then check C. If the application’s interface changes, the script breaks. QF-Test takes a hybrid approach: the AI agent works with clearly defined test building blocks and decides which ones to use and how, combining the flexibility of agent-based systems with the reproducibility demanded by reliable test automation.

What is an MCP server in the context of test automation?

MCP stands for Model Context Protocol, an open standard that facilitates communication between AI agents and external tools. An MCP server makes a tool’s capabilities (such as those of QF-Test) available to AI agents, such as Claude Code or Cursor. These agents can then call the tool as if it were a native function, eliminating the need for a human to manually operate the test environment.

Can QF-Test integrate with AI agents like Claude Code or GitHub Copilot?

Yes. With the MCP server from QF-Test, which is already available as a preview and will be fully integrated with QF-Test 11, AI agents like Claude Code can access QF-Test directly to launch tests, retrieve results, and control applications.

What platforms does QF-Test support for agentic testing?

QF-Test supports agentic testing for web applications, native Windows applications, Java GUIs (Swing, SWT, JavaFX, and Eclipse RCP), as well as Android and iOS apps and PDF documents. This makes QF-Test significantly broader in scope than pure web testing tools like Playwright.

When will QF-Test 11 with the full AI features be released?

The release of QF-Test 11 with integrated AI features, including test suite generation and a production-ready MCP server, is planned for 2026. An experimental preview version of the MCP server is currently available.

Does agentic testing replace human testers?

No, AI-generated test suites provide a structured first draft that experienced testers must further develop and validate. The value lies in reducing the effort required to create and maintain tests, not in replacing human expertise entirely.