A production-ready REST orchestration layer that unifies Claude, Gemini, GPT-4.1, Mistral, Grok, Llama, Cohere, Qwen, and Perplexity behind a single endpoint — with fallback, memory, search, deep research, and embeddings built in.
What's included
Built on Cloudflare Workers and D1 — globally distributed, serverless, and ready for production from day one.
One POST /chat call to reach any of the nine models. Switch providers by changing a single parameter.
If the primary model fails, the system transparently retries with the next provider in the chain — no downtime, no errors.
Each conversation session persists its last 10 turns in Cloudflare D1. Context follows the user across requests.
Enable real-time search on any request with search: true. Powered by Tavily — up to 5 agentic tool calls per response.
Every call logs token counts and USD cost per model, per session, and per API key. Query totals via GET /usage.
All integrated providers are hosted in the United States or European Union. No Chinese-jurisdiction infrastructure in the chain.
Autonomous research via POST /research. Gemini plans, searches, and synthesizes multi-source reports in the background — poll for results when ready.
3,072-dimension semantic embeddings via POST /embeddings. Single or batch (up to 100 texts). Ready to feed directly into any vector store or RAG pipeline.
Simple by design
Switch between providers without changing your integration. The orchestrator handles routing, fallback, and context automatically.
curl -X POST \ https://aiorchestrator.gntkh.com/chat \ -H "Authorization: Bearer orc_..." \ -H "Content-Type: application/json" \ -d '{ "prompt": "What happened in markets today?", "modelTarget": "gemini", "search": true, "session_id": "my-session-001" }' # Response includes token counts, USD cost, # and fallback info if triggered.
Pricing
No subscriptions. You get the source code — deploy it yourself on Cloudflare with your own API keys.
Full source code, ready to deploy
📦 What you receive
Multi-tenant · custom branding · SLA