Serverless · Cloudflare Workers · D1

One API. Nine AI models.
Zero complexity.

A production-ready REST orchestration layer that unifies Claude, Gemini, GPT-4.1, Mistral, Grok, Llama, Cohere, Qwen, and Perplexity behind a single endpoint — with fallback, memory, search, deep research, and embeddings built in.

Get the code View features
Claude Gemini GPT-4.1 Mistral Grok Llama / Groq Cohere Qwen Perplexity

Everything you need. Nothing you don't.

Built on Cloudflare Workers and D1 — globally distributed, serverless, and ready for production from day one.

Unified endpoint

One POST /chat call to reach any of the nine models. Switch providers by changing a single parameter.

🔄

Automatic fallback

If the primary model fails, the system transparently retries with the next provider in the chain — no downtime, no errors.

🧠

Session memory

Each conversation session persists its last 10 turns in Cloudflare D1. Context follows the user across requests.

🔍

Web search

Enable real-time search on any request with search: true. Powered by Tavily — up to 5 agentic tool calls per response.

💰

Cost tracking

Every call logs token counts and USD cost per model, per session, and per API key. Query totals via GET /usage.

🛡️

EU / US data sovereignty

All integrated providers are hosted in the United States or European Union. No Chinese-jurisdiction infrastructure in the chain.

🔬

Deep Research agent

Autonomous research via POST /research. Gemini plans, searches, and synthesizes multi-source reports in the background — poll for results when ready.

📐

Vector embeddings

3,072-dimension semantic embeddings via POST /embeddings. Single or batch (up to 100 texts). Ready to feed directly into any vector store or RAG pipeline.

One request.
Any model.

Switch between providers without changing your integration. The orchestrator handles routing, fallback, and context automatically.

curl · aiorchestrator.gntkh.com
curl -X POST \
  https://aiorchestrator.gntkh.com/chat \
  -H "Authorization: Bearer orc_..." \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What happened in markets today?",
    "modelTarget": "gemini",
    "search": true,
    "session_id": "my-session-001"
  }'

# Response includes token counts, USD cost,
# and fallback info if triggered.

Pay once. Own your infrastructure.

No subscriptions. You get the source code — deploy it yourself on Cloudflare with your own API keys.

Enterprise

Multi-tenant · custom branding · SLA

Custom
Contact us for a quote
  • Everything in Starter
  • Multi-tenant architecture (one worker, multiple clients)
  • Admin dashboard with billing per consumption
  • White-label branding
  • SLA + priority support
  • Custom model integrations on request
Contact us