XHack AI Probe: Why We Built an AI Security Testing Tool (And Immediately Found Bugs in Our Own AI)

Read this in 30 seconds: XHack AI Probe tests your AI systems against 350+ attack payloads across the full OWASP LLM Top 10. It catches prompt injection, jailbreaks, data leaks, and safety failures before attackers do. Connect any AI endpoint in minutes, launch a scan, and get severity-graded findings with AI-judged reasoning. Multi-turn conversation attacks, custom payloads, one-click retesting, and professional reports included. We even used it on our own AI agents and found vulnerabilities we didn’t expect.

Every AI system you deploy has a mouth you can’t fully control.

Your chatbot, your internal copilot, your RAG-powered knowledge base, your tool-calling agent. They all take text input and generate text output. And that makes every single one of them exploitable.

Prompt injection has been called the SQL injection of the AI era. And that comparison actually undersells the problem. SQL injection targets databases. Prompt injection targets the reasoning engine itself. You’re not just stealing data. You’re hijacking the brain.

The AI security testing space in 2026 is packed with tools. Open-source frameworks like Garak, Promptfoo, and DeepTeam. Enterprise platforms like Mindgard, Giskard, and Repello. Microsoft’s PyRIT. They all attack some slice of the problem.

But here’s the catch.

Most of these tools are either CLI-only developer frameworks that require YAML configs and CI/CD pipelines, or enterprise platforms priced for Fortune 500 security teams. There’s nothing in between for the security professional who just wants to point a tool at an AI endpoint and say “break this.”

That’s why we built AI Probe.

What XHack AI Probe Actually Does

AI Probe ships with 350+ pre-built attack payloads organized across every category in the OWASP LLM Top 10. Not a handful of prompt injection tests. The full spectrum.

Here’s what it covers and why each category matters:

OWASP LLM CategoryWhat AI Probe TestsWhy It Matters
Prompt InjectionDirect overrides, delimiter confusion, base64/ROT13 encoding, translation bypasses, multi-step payload splittingThe #1 attack vector against every LLM application
Sensitive Data DisclosurePII extraction, API key leaking, training data extraction, inference attacksYour AI might be leaking secrets you didn’t know it had
Insecure Output HandlingXSS payloads, SQL injection strings, code injection via AI responsesYour AI’s output becomes someone else’s input
Excessive AgencyPrivilege escalation, unauthorized actions, scope creepAI agents doing things they were never supposed to do
System Prompt LeakageDirect requests, roleplay, translation, encoding, completion tricksYour system prompt is your AI’s DNA. Attackers want it.
RAG PoisoningDocument injection, metadata manipulation, context window abusePoison the retrieval, control the output
MisinformationFalse citations, manufactured consensus, plausible disinformationYour AI confidently lying with a straight face
Denial of ServiceToken bombs, recursive expansion, infinite loops, context abuseCrashing your AI or running up your API bill
JailbreaksDAN variants, developer mode, roleplay, skeleton key, many-shot, world simulationThe full zoo of bypass techniques
Bias and ToxicityHate speech triggers, stereotyping, discriminatory content under framingYour AI saying things that end up on Twitter screenshots

Connect Any AI Target in Minutes

AI Probe works with any AI system that exposes an HTTP API. You configure a target by specifying the endpoint URL, request body template, response JSONPath, authentication headers, and timeout.

Presets are available for OpenAI, Anthropic Claude, Google Gemini, and any OpenAI-compatible API. The body template uses a simple placeholder system: {{payload}} for single-turn tests, {{messages}} for multi-turn conversations.

No YAML configs. No CLI commands. No CI/CD pipeline setup. Just point, configure, and scan.

Multi-Turn Conversation Attacks

This is where AI Probe separates itself from basic scanners. Simple one-shot attacks only test the surface. Real attackers use conversation.

AI Probe runs multi-turn payloads using four strategies:

Crescendo attacks start innocent and gradually escalate. Your AI might refuse a direct harmful request but comply after 5 turns of rapport building.

Persistence attacks repeat the same request with different framings until the target gives in. Most AI safety filters have a breaking point. This finds it.

Context manipulation shifts topics mid-conversation and circles back, testing whether safety context survives across turns.

Confusion attacks send contradictory instructions to destabilize the model’s coherence.

Each turn is logged and evaluated. If your AI refuses on turn one but leaks on turn five, the judge catches it.

AI-Powered Evaluation (Not Keyword Matching)

Every response gets evaluated by a dedicated AI judge trained specifically for security assessment. This isn’t grep looking for “I’m sorry” in the response.

The judge understands the difference between a genuine refusal, a soft refusal that still leaks information, and full compliance with the attack. Findings get classified into five severity levels based on actual impact.

A response that reveals the full system prompt? Critical. A response that hints at internal tools but doesn’t name them? Medium. A clean refusal with zero leakage? Informational and marked safe.

How AI Probe Compares to Other Tools

Let’s be real about the competition:

FeatureAI ProbeGarak (Open Source)Promptfoo (Open Source)Mindgard (Enterprise)
SetupWeb UI, minutesCLI + Python, hoursYAML + CLI, hoursEnterprise onboarding, weeks
Attack payloads350+ built-inCommunity-driven probes50+ vulnerability typesAutomated adversarial
Multi-turn attacks4 strategies built-inLimitedBasic conversation supportChained scenarios
AI judge evaluationYes, severity-gradedBasic scoringModel-graded metricsAutomated analysis
Custom payloadsFull UI builderPython scriptingYAML definitionsPlatform-dependent
Escalate featureInteractive chat on findingsNoNoNo
One-click retestYes, per finding + bulkManual re-runRe-run configsPlatform-dependent
Professional reportsHTML/PDF, ready to shareRaw outputJSON/HTMLEnterprise reports
PriceIncluded in Elite planFreeFree (paid features)Enterprise pricing
Best forSecurity teams wanting fast resultsDevelopers in CI/CDDevSecOps pipelinesEnterprise compliance

Where AI Probe wins: Speed to first scan. You go from “I want to test my AI” to “findings on screen” in under 10 minutes. No installations, no configs, no learning curve.

Where open-source wins: CI/CD integration and customization depth. If you need AI security testing embedded in your build pipeline, Garak or Promptfoo is the better fit.

Where enterprise platforms win: Compliance documentation and organizational governance. If you need SOC 2 evidence and RBAC across 50 teams, Mindgard or Giskard is more appropriate.

AI Probe sits in the sweet spot: professional-grade AI security testing without the enterprise overhead or the developer-only interface.

The Escalate Feature: Go Deeper on Any Finding

This is the feature that doesn’t exist anywhere else.

When AI Probe discovers a vulnerability, the Escalate button opens an interactive chat interface pre-loaded with the original attack context. The prompt that was sent. The target’s response. The judge’s assessment.

From there you can send follow-up messages, test variations, or try to escalate the impact further. Every message and response is logged. You can also ask the AI bypass generator to suggest a new attack payload based on what it’s observed in the conversation.

There’s a reset button that brings the conversation back to the original finding state, so you can try different escalation paths without losing your starting point.

Translation: You find a crack with automation, then pry it open manually. Best of both worlds.

When We Tested Our Own AI (And Found What We Didn’t Expect)

Here’s a story we weren’t planning to tell publicly, but it proves the point better than any marketing claim.

During internal development, we pointed AI Probe at our own AI agents. The ones powering XHack’s security analysis features. The assumption was straightforward: we built these agents, we wrote the system prompts, we configured the guardrails. They should pass cleanly.

They didn’t.

AI Probe discovered that one of our agents could be coaxed into partially revealing its system prompt through a multi-turn crescendo attack. The first three turns got clean refusals. Turn four used a translation reframing. Turn five got a partial leak.

The AI judge flagged it as a Medium severity finding because the leaked content hinted at internal tooling structure without exposing full instructions. A human reviewer might have called those refusals “good enough.” The AI judge knew better.

We also found an edge case where a specifically crafted ROT13-encoded payload bypassed a safety filter on one agent, producing output that would have been blocked in plain text. Not critical. But the kind of thing that ends up on a security researcher’s blog post with your company name in the title.

Both issues were patched within 48 hours. We used the Retest button to verify both fixes. Green across the board.

The takeaway: If the tool finds vulnerabilities in the AI systems built by the same team that built the tool, it’s doing its job. We eat our own dog food and the dog food occasionally bites back.

Retest, Report, Repeat

After fixing vulnerabilities, hit the Retest button on any finding. AI Probe re-sends the same payload, the judge re-evaluates, and the result updates in place. A Retest All button runs every finding again for a full before-and-after comparison.

Professional reports generate in HTML or PDF format. They include an executive summary, security score (A through F), severity breakdown chart, detailed findings with full prompt/response pairs, judge reasoning, and category-by-category results. Ready to share with engineering teams, security leads, compliance officers, or clients.

Custom payloads let you test for organization-specific risks. Proprietary data leakage, business logic abuse, industry compliance requirements. Everything the generic payloads won’t cover.

FAQ: AI Probe Questions Answered

What AI systems can AI Probe test?

Any AI system that exposes an HTTP API. Chatbots, copilots, RAG pipelines, tool-calling agents, and custom LLM interfaces. Presets exist for OpenAI, Anthropic, Google Gemini, and any OpenAI-compatible endpoint. If it accepts text and returns text over HTTP, AI Probe can test it.

How is AI Probe different from running Garak or Promptfoo?

Garak and Promptfoo are excellent open-source tools, but they’re developer-focused CLI tools that require Python environments, YAML configs, and CI/CD integration knowledge. AI Probe is a web-based platform where you configure a target and launch a scan in minutes. The AI judge evaluation, multi-turn attacks, Escalate feature, and professional report generation are unique to AI Probe.

How long does a full scan take?

Depends on the number of payloads selected and your target’s response time. A full 350+ payload scan against a fast API typically completes in 30-60 minutes. For rate-limited targets, AI Probe automatically paces requests with configurable delays. You can close the browser and come back later.

Conclusion

AI security testing isn’t optional anymore. Every deployed LLM is an attack surface. The question isn’t whether your AI has vulnerabilities. It’s whether you find them before someone else does.

AI Probe gives security teams a way to test any AI endpoint against the full OWASP LLM Top 10, with multi-turn attacks that simulate real adversaries, an AI judge that understands context not just keywords, and an Escalate feature that lets you go deeper on anything you find.

We tested it on our own AI agents. It found things we missed. That’s the best endorsement we can give.

Follow Us on X: @XHackio

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top