Skip to main content

Developers

Smartloop exposes an OpenAI-compatible Chat Completions API, making it easy to build applications on top of it or migrate existing ones with minimal changes.

API Endpoint

First grab the base URL by copying /pasting the following command to your terminal:

slp status

This will return a table listing model size ,context, project, etc. for the device:

+-----------------+--------------------------------------------------------+
| Property | Value |
+-----------------+--------------------------------------------------------+
| Server | http://127.0.0.1:42669 |
| Model loaded | True |
| Flash attention | False |
| Model size | 1020 MB |
| Memory usage | 9% |
| GPU | NVIDIA GeForce RTX 4060 Laptop GPU |
| GPU memory | 7.6 GB |
+-----------------+--------------------------------------------------------+

Here in this case the base_url is http://127.0.0.1:42669/v1

The API follows the OpenAI Chat Completions specification, so any client or SDK that supports OpenAI will work out of the box.

Quick Start (Python)

Install the OpenAI Python SDK:

pip install openai

Send your first chat completion request:

from openai import OpenAI

client = OpenAI(
base_url="http://http://127.0.0.1:42669/v1",
api_key="not-needed",
)

response = client.chat.completions.create(
model="gemma3-1b",
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
)

print(response.choices[0].message.content)

Quick Start (JavaScript)

Install the OpenAI Node.js SDK:

npm install openai
import OpenAI from "openai";

const client = new OpenAI({
baseURL: "http://127.0.0.1:42669/v1",
apiKey: "not-needed",
});

const response = await client.chat.completions.create({
model: "gemma3-1b",
messages: [
{ role: "user", content: "What is the capital of France?" },
],
});

console.log(response.choices[0].message.content);

Quick Start (cURL)

curl http://127.0.0.1:42669/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma3-1b",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'

Message Roles

RoleDescription
systemSets the behavior and context for the assistant
userThe end user's message
assistantPrevious assistant responses (for multi-turn conversations)

Response Format

{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
]
}

Access the response content:

response.choices[0].message.content

Multi-turn Conversations

To maintain context across messages, include the conversation history:

response = client.chat.completions.create(
model="gemma3-1b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is Alice."},
{"role": "assistant", "content": "Hello Alice! How can I help you today?"},
{"role": "user", "content": "What's my name?"},
],
)
info

All inference runs locally on your machine. Your data never leaves your device.