One endpoint
Point compatible SDKs at a single gateway URL and route requests without rewriting every integration.
ngrok-powered model routing
Route AI requests through a single ngrok endpoint to providers such as OpenAI, Anthropic, Google, and self-hosted models with provider failover, SDK compatibility, gateway keys, and visibility into the traffic moving between your app and the models it depends on.
AI apps move faster when provider details, auth, routing, and traffic inspection live at the edge instead of being hard-coded into every client.
Point compatible SDKs at a single gateway URL and route requests without rewriting every integration.
Retry across configured models, providers, or keys when a selected route cannot complete the request.
Inspect, secure, and observe AI traffic before it reaches cloud providers or local inference servers.
The gateway validates the request, chooses a model/provider path, forwards the call, and returns the response to the application.
Your app sends an SDK or HTTP request to the gateway endpoint.
ngrok validates the AI Gateway API key before routing traffic.
The gateway selects a model, provider, or failover chain.
The request is forwarded to a cloud provider or self-hosted model.
The model output returns through the gateway to your app.
Keep your existing SDK shape. Change the base URL, use an AI Gateway API key, and let the gateway handle routing behind the scenes.
from openai import OpenAI
client = OpenAI(
base_url="https://your-ai-gateway.ngrok.app/v1",
api_key="ng-xxxxx-g1-xxxxx"
)
response = client.chat.completions.create(
model="ngrok/auto",
messages=[
{"role": "user", "content": "Hello from the gateway"}
]
)
print(response.choices[0].message.content)
Use the gateway as a control point for AI traffic across managed providers, bring-your-own keys, and local inference.
Works with popular AI SDK patterns by changing the base URL.
Use routes such as model auto-selection for flexible request handling.
Use gateway keys without exposing provider credentials to clients.
Connect additional providers and account-specific access.
Route to local inference runtimes alongside cloud providers.
Restrict which clients can reach specific gateway routes.
Apply request or response modification policies at the gateway.
Inspect AI request behavior, latency, headers, and routing outcomes.