Developers
Smartloop exposes an OpenAI-compatible Chat Completions API, making it easy to build applications on top of it or migrate existing ones with minimal changes.
API Endpoint
First grab the base URL by copying /pasting the following command to your terminal:
slp status
This will return a table listing model size ,context, project, etc. for the device:
+-----------------+--------------------------------------------------------+
| Property | Value |
+-----------------+--------------------------------------------------------+
| Server | http://127.0.0.1:42669 |
| Model loaded | True |
| Flash attention | False |
| Model size | 1020 MB |
| Memory usage | 9% |
| GPU | NVIDIA GeForce RTX 4060 Laptop GPU |
| GPU memory | 7.6 GB |
+-----------------+--------------------------------------------------------+
Here in this case the base_url is http://127.0.0.1:42669/v1
The API follows the OpenAI Chat Completions specification, so any client or SDK that supports OpenAI will work out of the box.
Quick Start (Python)
Install the OpenAI Python SDK:
pip install openai
Send your first chat completion request:
from openai import OpenAI
client = OpenAI(
base_url="http://http://127.0.0.1:42669/v1",
api_key="not-needed",
)
response = client.chat.completions.create(
model="gemma3-1b",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
)
print(response.choices[0].message.content)
Quick Start (JavaScript)
Install the OpenAI Node.js SDK:
npm install openai
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://127.0.0.1:42669/v1",
apiKey: "not-needed",
});
const response = await client.chat.completions.create({
model: "gemma3-1b",
messages: [
{ role: "user", content: "What is the capital of France?" },
],
});
console.log(response.choices[0].message.content);
Quick Start (cURL)
curl http://127.0.0.1:42669/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3-1b",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'
Message Roles
| Role | Description |
|---|---|
system | Sets the behavior and context for the assistant |
user | The end user's message |
assistant | Previous assistant responses (for multi-turn conversations) |
Response Format
{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
]
}
Access the response content:
response.choices[0].message.content
Multi-turn Conversations
To maintain context across messages, include the conversation history:
response = client.chat.completions.create(
model="gemma3-1b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is Alice."},
{"role": "assistant", "content": "Hello Alice! How can I help you today?"},
{"role": "user", "content": "What's my name?"},
],
)
All inference runs locally on your machine. Your data never leaves your device.