List Models
Returns all models currently available on this Ollama instance.
Try it
Response schema
cURL
fieldtypedescription
modelsarrayList of installed model objects
models[].namestringModel identifier, e.g. "qwen2.5-coder:7b-instruct-q4_K_M"
models[].sizeintegerSize on disk in bytes
models[].digeststringSHA256 digest of the model
models[].modified_atstringISO 8601 last modified timestamp
curl -s \
-H "Authorization: Bearer YOUR_API_KEY" \
/api/tags | jq
-H "Authorization: Bearer YOUR_API_KEY" \
/api/tags | jq
Generate Completion
Generate a response for a given prompt. Supports streaming token-by-token output.
/api/generate
Try it
Parameters
cURL
parametertypedescription
model requiredstringName of the model to use
prompt requiredstringThe prompt to generate a response for
streambooleanIf true, returns a stream of JSON objects (default: true)
options.temperaturefloatControls randomness. 0 = deterministic, 2 = very random (default: 0.8)
options.num_predictintegerMaximum tokens to generate (-1 = unlimited)
options.top_pfloatNucleus sampling threshold (default: 0.9)
systemstringSystem prompt to use (overrides model default)
contextint[]Context from previous response to maintain conversation
# Streaming generate request
curl -X POST -s \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:7b-instruct-q4_K_M",
"prompt": "Write a Python prime checker",
"stream": true,
"options": { "temperature": 0.7, "num_predict": 512 }
}' \
/api/generate
curl -X POST -s \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:7b-instruct-q4_K_M",
"prompt": "Write a Python prime checker",
"stream": true,
"options": { "temperature": 0.7, "num_predict": 512 }
}' \
/api/generate
Chat Completion
OpenAI-compatible chat endpoint. Works with any tool that supports the OpenAI SDK — Continue.dev, LangChain, LlamaIndex, etc.
/api/chat
Try it
Parameters
cURL
parametertypedescription
model requiredstringModel identifier
messages requiredarrayArray of {role, content} objects. Roles: system, user, assistant
streambooleanStream the response (default: false for this endpoint)
options.temperaturefloatSampling temperature 0–2
options.num_predictintegerMax tokens to generate
# OpenAI-compatible chat request
curl -X POST -s \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:7b-instruct-q4_K_M",
"messages": [
{"role":"system","content":"You are a coding assistant."},
{"role":"user","content":"Explain async/await"}
],
"stream": false
}' \
/api/chat
curl -X POST -s \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:7b-instruct-q4_K_M",
"messages": [
{"role":"system","content":"You are a coding assistant."},
{"role":"user","content":"Explain async/await"}
],
"stream": false
}' \
/api/chat
Generate Embeddings
Generate vector embeddings for a given text. Useful for semantic search, RAG pipelines, and similarity comparisons.
/api/embeddings
Try it
Parameters
cURL
parametertypedescription
model requiredstringModel to generate embeddings with
prompt requiredstringText to generate embeddings for
Note: For production RAG use, consider a dedicated embedding model like
nomic-embed-text — pull it with ollama pull nomic-embed-text. It's much faster and lighter than using a general model for embeddings.
curl -X POST -s \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:7b-instruct-q4_K_M",
"prompt": "The quick brown fox"
}' \
/api/embeddings
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:7b-instruct-q4_K_M",
"prompt": "The quick brown fox"
}' \
/api/embeddings