OpenAI compatibility
Note: OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama Python library, JavaScript library and REST API.
Ollama provides experimental compatibility with parts of the OpenAI API to help connect existing applications to Ollama.
Usage
OpenAI Python library
python
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
# required but ignored
api_key='ollama',
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama3.2',
)
response = client.chat.completions.create(
model="llava",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": "",
},
],
}
],
max_tokens=300,
)
completion = client.completions.create(
model="llama3.2",
prompt="Say this is a test",
)
list_completion = client.models.list()
model = client.models.retrieve("llama3.2")
embeddings = client.embeddings.create(
model="all-minilm",
input=["why is the sky blue?", "why is the grass green?"],
)
OpenAI JavaScript library
javascript
import OpenAI from 'openai'
const openai = new OpenAI({
baseURL: 'http://localhost:11434/v1/',
// required but ignored
apiKey: 'ollama',
})
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }],
model: 'llama3.2',
})
const response = await openai.chat.completions.create({
model: "llava",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image_url",
image_url: "",
},
],
},
],
})
const completion = await openai.completions.create({
model: "llama3.2",
prompt: "Say this is a test.",
})
const listCompletion = await openai.models.list()
const model = await openai.models.retrieve("llama3.2")
const embedding = await openai.embeddings.create({
model: "all-minilm",
input: ["why is the sky blue?", "why is the grass green?"],
})
curl
shell
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llava",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'\''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "",
}
}
]
}
],
"max_tokens": 300
}'
curl http://localhost:11434/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"prompt": "Say this is a test"
}'
curl http://localhost:11434/v1/models
curl http://localhost:11434/v1/models/llama3.2
curl http://localhost:11434/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "all-minilm",
"input": ["why is the sky blue?", "why is the grass green?"]
}'
Endpoints
/v1/chat/completions
Supported features
- [x] Chat completions
- [x] Streaming
- [x] JSON mode
- [x] Reproducible outputs
- [x] Vision
- [x] Tools (streaming support coming soon)
- [ ] Logprobs
Supported request fields
- [x]
model
- [x]
messages
- [x] Text
content
- [x] Image
content
- [x] Base64 encoded image
- [ ] Image URL
- [x] Array of
content
parts
- [x] Text
- [x]
frequency_penalty
- [x]
presence_penalty
- [x]
response_format
- [x]
seed
- [x]
stop
- [x]
stream
- [x]
temperature
- [x]
top_p
- [x]
max_tokens
- [x]
tools
- [ ]
tool_choice
- [ ]
logit_bias
- [ ]
user
- [ ]
n
/v1/completions
Supported features
- [x] Completions
- [x] Streaming
- [x] JSON mode
- [x] Reproducible outputs
- [ ] Logprobs
Supported request fields
- [x]
model
- [x]
prompt
- [x]
frequency_penalty
- [x]
presence_penalty
- [x]
seed
- [x]
stop
- [x]
stream
- [x]
temperature
- [x]
top_p
- [x]
max_tokens
- [x]
suffix
- [ ]
best_of
- [ ]
echo
- [ ]
logit_bias
- [ ]
user
- [ ]
n
Notes
prompt
currently only accepts a string
/v1/models
Notes
created
corresponds to when the model was last modifiedowned_by
corresponds to the ollama username, defaulting to"library"
/v1/models/{model}
Notes
created
corresponds to when the model was last modifiedowned_by
corresponds to the ollama username, defaulting to"library"
/v1/embeddings
Supported request fields
- [x]
model
- [x]
input
- [x] string
- [x] array of strings
- [ ] array of tokens
- [ ] array of token arrays
- [ ]
encoding format
- [ ]
dimensions
- [ ]
user
Models
Before using a model, pull it locally ollama pull
:
shell
ollama pull llama3.2
Default model names
For tooling that relies on default OpenAI model names such as gpt-3.5-turbo
, use ollama cp
to copy an existing model name to a temporary name:
ollama cp llama3.2 gpt-3.5-turbo
Afterwards, this new model name can be specified the model
field:
shell
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
Setting the context size
The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a Modelfile
which looks like:
modelfile
FROM <some model>
PARAMETER num_ctx <context size>
Use the ollama create mymodel
command to create a new model with the updated context size. Call the API with the updated model name:
shell
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mymodel",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'