Skip to content
← Blog
5 min read

Self-Hosted Shield: LLM Verification Inside Your VPC

Run VeroQ Shield on your infrastructure. Your models, your data, nothing leaves your network. Two verification modes: groundedness (local) + factual (opt-in). Docker deploy in 30 seconds.

Every AI team building with RAG hits the same wall: you need to verify your LLM's output, but you can't send internal documents to a third-party API. Earnings data, patient records, legal briefs, classified material — none of it can leave your network.

VeroQ Shield (Self-Hosted) solves this. It runs inside your VPC, uses your own LLM provider, and verifies every claim your AI makes — without sending a single byte of your data externally.

Deploy in 30 Seconds

bash
docker run -p 3000:3000 \
  -e LLM_BASE_URL=https://api.openai.com/v1 \
  -e LLM_API_KEY=sk-... \
  -e LLM_MODEL=gpt-5.4-mini \
  veroq/shield

That's it. Shield is running on your infrastructure, using your LLM provider. For air-gapped environments, point it at a local Ollama or vLLM instance instead:

bash
docker run -p 3000:3000 \
  -e LLM_BASE_URL=http://localhost:11434/v1 \
  -e LLM_API_KEY=none \
  -e LLM_MODEL=llama3 \
  veroq/shield

No internet connection required. Fully air-gapped.

Two Verification Modes

Self-hosted Shield has two distinct verification modes that solve different problems. Most teams need both.

Mode 1: Groundedness

“Did the LLM stay faithful to the retrieved documents?”

Pass the LLM's response and your retrieved context. Shield extracts each claim and cross-references it against the documents. Runs entirely locally — your LLM does the extraction and verification. No data leaves your network.

bash
curl -X POST http://localhost:3000/shield \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The company reported $2.4B in Q3 revenue, up 15% YoY.",
    "context": "Q3 2024 Earnings: Revenue was $2.1B, representing 12% YoY growth.",
    "mode": "groundedness"
  }'

Response: both claims contradicted. Revenue was $2.1B (not $2.4B), growth was 12% (not 15%). Caught in 2.4 seconds, all local.

Mode 2: Factual

“Are the claims actually true in the real world?”

This mode verifies claims against live external evidence via the VeroQ cloud API. It's opt-in — only the extracted claim text is sent, never your documents, context, or metadata.

Why You Need Both Modes

This is the insight most teams miss: a RAG system can be perfectly grounded and still completely wrong.

Here's a real example from our tests:

Claim: “Apple had $124B in Q1 2025 revenue”

Internal document: “Apple Q1 FY2025 revenue was $124.3B per the 10-Q filing”

Groundedness: SUPPORTED — matches the internal document

Factual: CONTRADICTED — real-world data shows $144B

The analyst note was three months old. The LLM faithfully cited it. The groundedness check passed. But the number was $20 billion off.

Without both verification layers, this error ships to production. With both, you catch it — and you know why it happened (stale source document vs. LLM fabrication).

Bring Your Own LLM

Shield works with any OpenAI-compatible API. Use the model you're already running:

ProviderBase URLAir-Gapped?
OpenAIapi.openai.com/v1No
Azure OpenAIYOUR.openai.azure.com/...No
Ollamalocalhost:11434/v1Yes
vLLMlocalhost:8000/v1Yes
NVIDIA NIMintegrate.api.nvidia.com/v1No
Groqapi.groq.com/openai/v1No

Security & Compliance

  • Data isolation: Groundedness mode never sends data outside your network.
  • Factual mode is opt-in: Only extracted claim text (not your documents) is sent to VeroQ API.
  • No persistent storage: Shield is stateless. No database, no logs, no telemetry.
  • Compliance ready: Runs on your already-audited infrastructure with your approved LLM provider. No new vendor to vet.
  • Open source: Full source code on GitHub. Audit it, modify it, deploy it your way.

Get Started

Self-hosted Shield is open source and MIT licensed. Deploy it now: