FAQ - ModelRunner Docs

Accounts & authentication

How do I authenticate API requests?

Pass your key in the Authorization header with the Key scheme: Authorization: Key <your_key>. See API keys for how to create and rotate keys.

Can I use ModelRunner directly from the browser?

Not with your API key — it would be exposed to end users. Use the server-proxy pattern shown in the JavaScript client guide: your server keeps the key, and the browser routes requests through /api/modelrunner/proxy.

Can my AI assistant call ModelRunner directly?

Yes — connect the MCP server. Claude Desktop, Claude Code, and Cursor support it out of the box.

Billing & limits

How is billing calculated?

Models price per request, per second of GPU time, or per output (depending on the model). The price is settled when a request reaches COMPLETED. Failed and cancelled requests are not billed.

Do I pay if a request fails?

No. A request that transitions to FAILED or CANCELLED is not charged. A request whose output failed schema validation (422) is also not charged — you see the upstream error and the balance is unaffected.

What happens when I run out of credits?

Submitting a new request returns HTTP 402 Insufficient Balance. Top up credits and retry the request — no other action is needed.

Are there rate limits?

There are no per-second request limits today. Submit as many concurrent requests as you like — they queue and process as provider capacity allows. If a provider returns 429, the API passes it through so your client can back off.

Files & inputs

What file types can I upload?

Any binary type. ModelRunner storage accepts whatever content_type you declare on the upload — what matters is whether the target model accepts that type. Common types include image/png, image/jpeg, image/webp, video/mp4, video/webm, audio/mpeg, audio/wav, application/pdf.

What's the largest file I can upload?

Single-part uploads work up to S3’s ~5 GB per-PUT limit. For anything larger, or when you need resumability, use the multipart upload flow. The MCP server’s upload_file tool caps inline payloads at 200 MiB — larger files should use the direct multipart endpoints.

How long are uploaded files retained?

Uploaded files and model outputs are kept indefinitely under your account. You can list and delete them via the files API (GET /files, DELETE /files/:id).

Requests & lifecycle

Does ModelRunner support webhooks?

Not yet. Use the Server-Sent Events stream (GET /requests/stream) for push-style updates without polling, or poll status_url directly. Webhook delivery is on the roadmap.

What's my position in the queue?

The queue_position field in status responses is currently always 0 — real queue depth is not tracked yet. Use the SSE stream to receive updates the moment your request transitions.

How long can a request stay in flight?

Up to 6 hours and 5 finalization attempts. Beyond that the platform force-fails the request. See request lifecycle.

Can I cancel a running request?

GET the cancel_url returned when you created the request. Cancellation is immediate if the provider hasn’t started; best-effort otherwise. See request lifecycle.

What happens if my client crashes mid-request?

Nothing is lost. The platform’s background finalization sweep will still complete the request and settle billing. Retrieve it later with GET /requests/{requestId} or list your history.

Errors

What does `422` mean on a result fetch?

The provider returned output that didn’t match the model’s declared schema, or the provider failed after the request was already billing-normalized. Read error for the human-readable failure and details.validationErrors for per-field issues. You are not billed. See errors.

Which errors are safe to retry?

Only 429 (with backoff) and transient 5xx. Treat 400, 401, 402, 403, 404, and 422 as deterministic failures — retrying with the same input will produce the same error.

Documentation Index

​Accounts & authentication

​Billing & limits

​Files & inputs

​Requests & lifecycle

​Errors

Accounts & authentication

Billing & limits

Files & inputs

Requests & lifecycle

Errors