Swift Dev Labs · AI API Playground

List Models

Returns all models currently available on this Ollama instance.

GET /api/tags

Try it

Response schema

cURL

modelsarrayList of installed model objects

models[].namestringModel identifier, e.g. "qwen2.5-coder:7b-instruct-q4_K_M"

models[].sizeintegerSize on disk in bytes

models[].digeststringSHA256 digest of the model

models[].modified_atstringISO 8601 last modified timestamp

curl -s \
-H "Authorization: Bearer YOUR_API_KEY" \
/api/tags | jq

Generate Completion

Generate a response for a given prompt. Supports streaming token-by-token output.

POST /api/generate

Try it

Parameters

cURL

Model

Prompt

Temperature (0–2)

Max tokens

Stream response (token by token)

model requiredstringName of the model to use

prompt requiredstringThe prompt to generate a response for

streambooleanIf true, returns a stream of JSON objects (default: true)

options.temperaturefloatControls randomness. 0 = deterministic, 2 = very random (default: 0.8)

options.num_predictintegerMaximum tokens to generate (-1 = unlimited)

options.top_pfloatNucleus sampling threshold (default: 0.9)

systemstringSystem prompt to use (overrides model default)

contextint[]Context from previous response to maintain conversation

# Streaming generate request
curl -X POST -s \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b-instruct-q4_K_M",
    "prompt": "Write a Python prime checker",
    "stream": true,
    "options": { "temperature": 0.7, "num_predict": 512 }
  }' \
  /api/generate

Chat Completion

OpenAI-compatible chat endpoint. Works with any tool that supports the OpenAI SDK — Continue.dev, LangChain, LlamaIndex, etc.

POST /api/chat

Try it

Analytics

Parameters

cURL

Device ID (local) … Session …

Model

System message (optional)

Temperature

Max tokens

Stream assistant reply (ChatGPT-style typing)

Assistant output

Preview (rendered Markdown) Markdown source (raw text)

Last request: — Latency: — Model: —

About storage, analytics & request timeline

Chats are stored only in this browser (localStorage), keyed by your device ID. They are not sent to a separate “chat cloud” unless you use the API yourself.

Optional server beacons: POST /api/playground/analytics with the same auth as the API. Nginx must allow this path (see setup-domain.sh). Each payload includes clientProfile (user agent, screen, timezone, language) for attribution.

Request timeline (extra phases): When traffic goes through the ollama_logger proxy (default 11435), responses can include X-SDL-Proxy-Timing and stream markers for Ollama/proxy timing — not when the browser talks to Ollama directly. Queue depth is not exposed by Ollama over HTTP. Chat and Generate calls send keep_alive: "30m" so Ollama is more likely to keep the model loaded between turns (subject to server RAM settings).

Local diagnostics for this playground: what the browser knows about your API usage, storage, and a live probe of the server. Nothing here replaces server-side logs (e.g. SQLite from ollama_logger).

Environment

Open this tab and click Refresh to probe the API.

Chat usage (recorded in this browser)

Latency (recent events)

By model

Model	Turns	OK	Avg latency	Prompt Σ	Reply Σ

HTTP status (recent events)

Saved chat sessions (localStorage)

Estimated playground storage

Recent errors

This browser (export includes this)

Event log (latest 50)

Timeline columns are from the saved phase log (new events only). Click Show input / output to expand that turn’s text.

Time	OK	HTTP	Proxy	Stream↑	Output	Total	Model	In	Out	Client	I/O	Error

model requiredstringModel identifier

messages requiredarrayArray of {role, content} objects. Roles: system, user, assistant

streambooleanStream the response (default: false for this endpoint)

options.temperaturefloatSampling temperature 0–2

options.num_predictintegerMax tokens to generate

# OpenAI-compatible chat request
curl -X POST -s \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b-instruct-q4_K_M",
    "messages": [
      {"role":"system","content":"You are a coding assistant."},
      {"role":"user","content":"Explain async/await"}
    ],
    "stream": false
  }' \
  /api/chat

Generate Embeddings

Generate vector embeddings for a given text. Useful for semantic search, RAG pipelines, and similarity comparisons.

POST /api/embeddings

Try it

Parameters

cURL

Model

Input text

model requiredstringModel to generate embeddings with

prompt requiredstringText to generate embeddings for

Note: For production RAG use, consider a dedicated embedding model like nomic-embed-text — pull it with ollama pull nomic-embed-text. It's much faster and lighter than using a general model for embeddings.

curl -X POST -s \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b-instruct-q4_K_M",
    "prompt": "The quick brown fox"
  }' \
  /api/embeddings