ngrok-powered model routing

One gateway for every AI model.

Route AI requests through a single ngrok endpoint to providers such as OpenAI, Anthropic, Google, and self-hosted models with provider failover, SDK compatibility, gateway keys, and visibility into the traffic moving between your app and the models it depends on.

Why put a gateway in front of AI?

AI apps move faster when provider details, auth, routing, and traffic inspection live at the edge instead of being hard-coded into every client.

01

One endpoint

Point compatible SDKs at a single gateway URL and route requests without rewriting every integration.

02

Automatic failover

Retry across configured models, providers, or keys when a selected route cannot complete the request.

03

Traffic visibility

Inspect, secure, and observe AI traffic before it reaches cloud providers or local inference servers.

Request flow

The gateway validates the request, chooses a model/provider path, forwards the call, and returns the response to the application.

  • Step 1

    App request

    Your app sends an SDK or HTTP request to the gateway endpoint.

  • Step 2

    Gateway auth

    ngrok validates the AI Gateway API key before routing traffic.

  • Step 3

    Route selection

    The gateway selects a model, provider, or failover chain.

  • Step 4

    Provider call

    The request is forwarded to a cloud provider or self-hosted model.

  • Step 5

    Response

    The model output returns through the gateway to your app.

SDK-compatible by design

Keep your existing SDK shape. Change the base URL, use an AI Gateway API key, and let the gateway handle routing behind the scenes.

openai_gateway.py
from openai import OpenAI client = OpenAI( base_url="https://your-ai-gateway.ngrok.app/v1", api_key="ng-xxxxx-g1-xxxxx" ) response = client.chat.completions.create( model="ngrok/auto", messages=[ {"role": "user", "content": "Hello from the gateway"} ] ) print(response.choices[0].message.content)

Gateway capabilities

Use the gateway as a control point for AI traffic across managed providers, bring-your-own keys, and local inference.

SDK compatibility

Works with popular AI SDK patterns by changing the base URL.

Automatic selection

Use routes such as model auto-selection for flexible request handling.

Managed keys

Use gateway keys without exposing provider credentials to clients.

Bring your own keys

Connect additional providers and account-specific access.

Self-hosted models

Route to local inference runtimes alongside cloud providers.

Access control

Restrict which clients can reach specific gateway routes.

Content controls

Apply request or response modification policies at the gateway.

Observability

Inspect AI request behavior, latency, headers, and routing outcomes.

live demo

Try the public gateway page.

This static demo is published on Cloudflare Pages and mirrored locally on the lab network for quick iteration.