A production-ready REST orchestration layer that unifies Claude, Gemini, GPT-4.1, Mistral, Grok, Llama, and Perplexity behind a single endpoint — with fallback, memory, search, and cost tracking built in.
What's included
Built on Cloudflare Workers and D1 — globally distributed, serverless, and ready for production from day one.
One POST /chat call to reach any of the seven models. Switch providers by changing a single parameter.
If the primary model fails, the system transparently retries with the next provider in the chain — no downtime, no errors.
Each conversation session persists its last 10 turns in Cloudflare D1. Context follows the user across requests.
Enable real-time search on any request with search: true. Powered by Tavily — up to 5 agentic tool calls per response.
Every call logs token counts and USD cost per model, per session, and per API key. Query totals via GET /usage.
All integrated providers are hosted in the United States or European Union. No Chinese-jurisdiction infrastructure in the chain.
Simple by design
Switch between providers without changing your integration. The orchestrator handles routing, fallback, and context automatically.
curl -X POST \ https://aiorchestrator.gntkh.com/chat \ -H "Authorization: Bearer orc_..." \ -H "Content-Type: application/json" \ -d '{ "prompt": "What happened in markets today?", "modelTarget": "perplexity", "search": true, "session_id": "my-session-001" }' # Response includes citations, token counts, # USD cost, and fallback info if triggered.
Pricing
No subscriptions. You get the source code — deploy it yourself on Cloudflare with your own API keys.
Full source code, ready to deploy
📦 What you receive
Multi-tenant · custom branding · SLA