Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.modelrunner.ai/llms.txt

Use this file to discover all available pages before exploring further.

Accounts & authentication

Pass your key in the Authorization header with the Key scheme: Authorization: Key <your_key>. See API keys for how to create and rotate keys.
Not with your API key — it would be exposed to end users. Use the server-proxy pattern shown in the JavaScript client guide: your server keeps the key, and the browser routes requests through /api/modelrunner/proxy.
Yes — connect the MCP server. Claude Desktop, Claude Code, and Cursor support it out of the box.

Billing & limits

Models price per request, per second of GPU time, or per output (depending on the model). The price is settled when a request reaches COMPLETED. Failed and cancelled requests are not billed.
No. A request that transitions to FAILED or CANCELLED is not charged. A request whose output failed schema validation (422) is also not charged — you see the upstream error and the balance is unaffected.
Submitting a new request returns HTTP 402 Insufficient Balance. Top up credits and retry the request — no other action is needed.
There are no per-second request limits today. Submit as many concurrent requests as you like — they queue and process as provider capacity allows. If a provider returns 429, the API passes it through so your client can back off.

Files & inputs

Any binary type. ModelRunner storage accepts whatever content_type you declare on the upload — what matters is whether the target model accepts that type. Common types include image/png, image/jpeg, image/webp, video/mp4, video/webm, audio/mpeg, audio/wav, application/pdf.
Single-part uploads work up to S3’s ~5 GB per-PUT limit. For anything larger, or when you need resumability, use the multipart upload flow. The MCP server’s upload_file tool caps inline payloads at 200 MiB — larger files should use the direct multipart endpoints.
Uploaded files and model outputs are kept indefinitely under your account. You can list and delete them via the files API (GET /files, DELETE /files/:id).

Requests & lifecycle

Not yet. Use the Server-Sent Events stream (GET /requests/stream) for push-style updates without polling, or poll status_url directly. Webhook delivery is on the roadmap.
The queue_position field in status responses is currently always 0 — real queue depth is not tracked yet. Use the SSE stream to receive updates the moment your request transitions.
Up to 6 hours and 5 finalization attempts. Beyond that the platform force-fails the request. See request lifecycle.
GET the cancel_url returned when you created the request. Cancellation is immediate if the provider hasn’t started; best-effort otherwise. See request lifecycle.
Nothing is lost. The platform’s background finalization sweep will still complete the request and settle billing. Retrieve it later with GET /requests/{requestId} or list your history.

Errors

The provider returned output that didn’t match the model’s declared schema, or the provider failed after the request was already billing-normalized. Read error for the human-readable failure and details.validationErrors for per-field issues. You are not billed. See errors.
Only 429 (with backoff) and transient 5xx. Treat 400, 401, 402, 403, 404, and 422 as deterministic failures — retrying with the same input will produce the same error.