Developers

Smartloop exposes an OpenAI-compatible Chat Completions API, making it easy to build applications on top of it or migrate existing ones with minimal changes.

API Endpoint

First grab the base URL by copying /pasting the following command to your terminal:

slp status

This will return a table listing model size ,context, project, etc. for the device:

+-----------------+--------------------------------------------------------+
| Property        | Value                                                  |
+-----------------+--------------------------------------------------------+
| Server          | http://127.0.0.1:42669                                 |
| Model loaded    | True                                                   |
| Flash attention | False                                                  |
| Model size      | 1020 MB                                                |
| Memory usage    | 9%                                                     |
| GPU             | NVIDIA GeForce RTX 4060 Laptop GPU                     |
| GPU memory      | 7.6 GB                                                 |
+-----------------+--------------------------------------------------------+

Here in this case the base_url is http://127.0.0.1:42669/v1

The API follows the OpenAI Chat Completions specification, so any client or SDK that supports OpenAI will work out of the box.

Quick Start (Python)

Install the OpenAI Python SDK:

pip install openai

Send your first chat completion request:

from openai import OpenAI

client = OpenAI(
    base_url="http://http://127.0.0.1:42669/v1",
    api_key="not-needed",
)

response = client.chat.completions.create(
    model="gemma3-1b",
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
)

print(response.choices[0].message.content)

Quick Start (JavaScript)

Install the OpenAI Node.js SDK:

npm install openai

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://127.0.0.1:42669/v1",
  apiKey: "not-needed",
});

const response = await client.chat.completions.create({
  model: "gemma3-1b",
  messages: [
    { role: "user", content: "What is the capital of France?" },
  ],
});

console.log(response.choices[0].message.content);

Quick Start (cURL)

curl http://127.0.0.1:42669/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma3-1b",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Message Roles

Role	Description
`system`	Sets the behavior and context for the assistant
`user`	The end user's message
`assistant`	Previous assistant responses (for multi-turn conversations)

Response Format

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ]
}

Access the response content:

response.choices[0].message.content

Multi-turn Conversations

To maintain context across messages, include the conversation history:

response = client.chat.completions.create(
    model="gemma3-1b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Alice."},
        {"role": "assistant", "content": "Hello Alice! How can I help you today?"},
        {"role": "user", "content": "What's my name?"},
    ],
)

info

All inference runs locally on your machine. Your data never leaves your device.

API Endpoint​

Quick Start (Python)​

Quick Start (JavaScript)​

Quick Start (cURL)​

Message Roles​

Response Format​

Multi-turn Conversations​