not connected
List Models
Returns all models currently available on this Ollama instance.
GET /api/tags
Try it
Response schema
cURL
fieldtypedescription
modelsarrayList of installed model objects
models[].namestringModel identifier, e.g. "qwen2.5-coder:7b-instruct-q4_K_M"
models[].sizeintegerSize on disk in bytes
models[].digeststringSHA256 digest of the model
models[].modified_atstringISO 8601 last modified timestamp
curl -s \
  -H "Authorization: Bearer YOUR_API_KEY" \
  /api/tags | jq
Generate Completion
Generate a response for a given prompt. Supports streaming token-by-token output.
POST /api/generate
Try it
Parameters
cURL
parametertypedescription
model requiredstringName of the model to use
prompt requiredstringThe prompt to generate a response for
streambooleanIf true, returns a stream of JSON objects (default: true)
options.temperaturefloatControls randomness. 0 = deterministic, 2 = very random (default: 0.8)
options.num_predictintegerMaximum tokens to generate (-1 = unlimited)
options.top_pfloatNucleus sampling threshold (default: 0.9)
systemstringSystem prompt to use (overrides model default)
contextint[]Context from previous response to maintain conversation
# Streaming generate request
curl -X POST -s \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b-instruct-q4_K_M",
    "prompt": "Write a Python prime checker",
    "stream": true,
    "options": { "temperature": 0.7, "num_predict": 512 }
  }'
\
  /api/generate
Chat Completion
OpenAI-compatible chat endpoint. Works with any tool that supports the OpenAI SDK — Continue.dev, LangChain, LlamaIndex, etc.
POST /api/chat
Try it
Analytics
Parameters
cURL
Device ID (local) Session
Assistant output
Last request: Latency: Model:
About storage, analytics & request timeline

Chats are stored only in this browser (localStorage), keyed by your device ID. They are not sent to a separate “chat cloud” unless you use the API yourself.

Optional server beacons: POST /api/playground/analytics with the same auth as the API. Nginx must allow this path (see setup-domain.sh). Each payload includes clientProfile (user agent, screen, timezone, language) for attribution.

Request timeline (extra phases): When traffic goes through the ollama_logger proxy (default 11435), responses can include X-SDL-Proxy-Timing and stream markers for Ollama/proxy timing — not when the browser talks to Ollama directly. Queue depth is not exposed by Ollama over HTTP. Chat and Generate calls send keep_alive: "30m" so Ollama is more likely to keep the model loaded between turns (subject to server RAM settings).

Local diagnostics for this playground: what the browser knows about your API usage, storage, and a live probe of the server. Nothing here replaces server-side logs (e.g. SQLite from ollama_logger).

Environment
Open this tab and click Refresh to probe the API.
Chat usage (recorded in this browser)
Latency (recent events)
By model
ModelTurnsOKAvg latencyPrompt ΣReply Σ
HTTP status (recent events)
Saved chat sessions (localStorage)
Estimated playground storage
Recent errors
This browser (export includes this)
Event log (latest 50)

Timeline columns are from the saved phase log (new events only). Click Show input / output to expand that turn’s text.

TimeOK HTTP Proxy Stream↑ Output Total ModelInOutClient I/OError
parametertypedescription
model requiredstringModel identifier
messages requiredarrayArray of {role, content} objects. Roles: system, user, assistant
streambooleanStream the response (default: false for this endpoint)
options.temperaturefloatSampling temperature 0–2
options.num_predictintegerMax tokens to generate
# OpenAI-compatible chat request
curl -X POST -s \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b-instruct-q4_K_M",
    "messages": [
      {"role":"system","content":"You are a coding assistant."},
      {"role":"user","content":"Explain async/await"}
    ],
    "stream": false
  }'
\
  /api/chat
Generate Embeddings
Generate vector embeddings for a given text. Useful for semantic search, RAG pipelines, and similarity comparisons.
POST /api/embeddings
Try it
Parameters
cURL
parametertypedescription
model requiredstringModel to generate embeddings with
prompt requiredstringText to generate embeddings for
Note: For production RAG use, consider a dedicated embedding model like nomic-embed-text — pull it with ollama pull nomic-embed-text. It's much faster and lighter than using a general model for embeddings.
curl -X POST -s \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b-instruct-q4_K_M",
    "prompt": "The quick brown fox"
  }'
\
  /api/embeddings