Instructions to use selorahomes/Selora-AI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use selorahomes/Selora-AI with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="selorahomes/Selora-AI")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("selorahomes/Selora-AI", dtype="auto")

llama-cpp-python

How to use selorahomes/Selora-AI with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="selorahomes/Selora-AI",
	filename="qwen3_17b_base.Q6_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use selorahomes/Selora-AI with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf selorahomes/Selora-AI:Q6_K
# Run inference directly in the terminal:
llama cli -hf selorahomes/Selora-AI:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf selorahomes/Selora-AI:Q6_K
# Run inference directly in the terminal:
llama cli -hf selorahomes/Selora-AI:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf selorahomes/Selora-AI:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf selorahomes/Selora-AI:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf selorahomes/Selora-AI:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf selorahomes/Selora-AI:Q6_K

Use Docker

docker model run hf.co/selorahomes/Selora-AI:Q6_K

LM Studio
Jan

vLLM

How to use selorahomes/Selora-AI with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "selorahomes/Selora-AI"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "selorahomes/Selora-AI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/selorahomes/Selora-AI:Q6_K

SGLang

How to use selorahomes/Selora-AI with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "selorahomes/Selora-AI" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "selorahomes/Selora-AI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "selorahomes/Selora-AI" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "selorahomes/Selora-AI",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use selorahomes/Selora-AI with Ollama:
```
ollama run hf.co/selorahomes/Selora-AI:Q6_K
```

Unsloth Studio

How to use selorahomes/Selora-AI with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for selorahomes/Selora-AI to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for selorahomes/Selora-AI to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for selorahomes/Selora-AI to start chatting

How to use selorahomes/Selora-AI with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf selorahomes/Selora-AI:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "selorahomes/Selora-AI:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use selorahomes/Selora-AI with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf selorahomes/Selora-AI:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default selorahomes/Selora-AI:Q6_K

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use selorahomes/Selora-AI with Docker Model Runner:
```
docker model run hf.co/selorahomes/Selora-AI:Q6_K
```

Lemonade

How to use selorahomes/Selora-AI with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull selorahomes/Selora-AI:Q6_K

Run and chat with the model

lemonade run user.Selora-AI-Q6_K

List all available models

lemonade list

Selora Homes: selorahomes.com Selora AI Home Assistant Integration: github.com/SeloraHomes/ha-selora-ai

Selora AI

Selora AI is an instruction-tuned language model for Home Assistant, the open-source smart home platform. Rather than one general model, it is five small, single-purpose LoRA specialists. Each emits one strict, compact JSON "slim envelope" for its intent, and the Selora AI integration executes that envelope against your home — running service calls, resolving live state, or saving an automation (see Specialists for the exact shapes). The five specialists:

command — control devices ("turn off the kitchen lights")
automation — author home automations and blueprints
answer — answer questions about the home's live state
clarification — ask a follow-up when a request is ambiguous
utilities — docs-grounded maintenance, troubleshooting, and setup help

The five specialists are LoRA adapters fine-tuned on a shared Qwen3 1.7B base.

Base: Qwen3 1.7B · Format: GGUF Q6_K base + 5 per-specialist LoRA adapters (F16) · License: Apache-2.0

Selora AI powers the Selora AI Home Assistant integration and runs locally on Apple Silicon, Linux, or Windows via llama-server or Ollama, or in the cloud via vLLM — built for self-hosted IoT deployments that stay private and offline-first.

Use cases

Chat control of smart-home devices — "turn off the kitchen lights", "set the thermostat to 68", "open the garage door" — resolved against live Home Assistant entity state.
Natural-language home automation creation — describe an automation in plain English ("when the front door opens after 10pm, turn on the porch light") and Selora returns valid Home Assistant YAML as a draft you review before it's saved.
Scene and routine orchestration — chain actions across multiple entities ("good night" → lock doors, dim bedroom lights, set thermostat) without hand-writing scripts.
Q&A about your home — "is the laundry running?", "what's the temperature upstairs?" — the answer adapter returns the entities to check plus a response template, and the integration fills in the live state.
Docs-grounded help — "why is the living-room sensor unavailable?", "how do I add the Hue integration?", "which integrations have pending updates?" — answered from retrieved Home Assistant documentation, with live state pulled in via entity placeholders and citations back to the source docs.
Privacy-first home assistant — runs entirely on local hardware (Raspberry Pi 5, Mac mini, NUC-class boxes) with no cloud dependency, so device commands and home telemetry never leave the LAN.

Specialists

Every specialist LoRA emits a compact "slim envelope" — a small JSON object whose high-frequency keys are single characters to save tokens — and that envelope is an intermediate representation, not a finished action. The Selora AI integration's _convert_slim_shape parser turns it into real Home Assistant activity: run a downloaded GGUF on its own and you get the envelope back; it takes the integration (or an equivalent parser) to execute service calls, resolve live state, build YAML, or attach citations. By design the model never fabricates device state — it emits the calls to make, the entities to look up, the question to ask, or the automation to build, and the integration completes the loop against live HA.

Adapter	Emits (raw model output)	What the integration does with it
`command`	`{"c":[{"s":"<service>","e":"<entity_id>","d":{…}}],"r":"<confirmation>"}`	Executes each `c` as an HA service call, in array order, then shows `r`. The model does not call services itself.
`answer`	`{"r":"<text with {entity_id} placeholders>","q":["<entity_id>",…]}`	Looks up the `q` entities' live state and substitutes it into the `{entity_id}` placeholders in `r`. The model templates state; it never reads it.
`clarification`	`{"q":"<question>","o":["<option>",…]}`	Surfaces the question and the optional quick-reply `o` options; the user's next reply is the action.
`utilities`	`{"r":"<advice with {entity_id} placeholders>","q":["<entity_id>",…],"src":["<doc_chunk_id>",…]}`	Substitutes live `q` state into the advice and surfaces the `src` citations. Advice is grounded in a `RELEVANT DOCS` block injected at inference.
`automation`	Blueprint request → a fenced `yaml` HA blueprint (`blueprint:` / `input:` / `!input`). Concrete request → `{"intent":"automation","response":…,"automation":{"alias":…,"triggers":[…],"conditions":[…],"actions":[…]}}`	Uses the blueprint YAML (nearly) as-is, or maps the concrete `automation` object JSON→YAML into a saved HA automation.

Key map, applied once on the HA side: r=response, q=query / entity list, c=calls, s=service, e=entity_id, d=data (service params), o=options.

The integration's selora_local provider classifies each request to one specialist (a regex pre-classifier), then activates the matching adapter — llama-server's /lora-adapters hot-swap on the production hub, or vLLM --enable-lora.

Example envelopes

Real output for each specialist — the raw model emission; the integration then executes or resolves it:

// command — "Turn off the kitchen light and lock the front door"
{"c":[{"s":"light.turn_off","e":"light.kitchen"},{"s":"lock.lock","e":"lock.front_door"}],"r":"Kitchen light off and front door locked."}

// answer — "Is the bedroom light on?"
{"q":["light.bedroom"],"r":"The bedroom light is {light.bedroom}."}

// clarification — "Turn on a light" (when several exist)
{"q":"Which light — kitchen, bedroom, or living room?","o":["kitchen","bedroom","living room"]}

// utilities — "Why does the Hue bridge show an update?"
{"r":"The {update.philips_hue} update is pending; install it from Settings → Devices & Services → Updates.","q":["update.philips_hue"],"src":["home-assistant/updates#install"]}

// automation (concrete) — "Lock the front door at 10pm if it's still unlocked"
{"intent":"automation","response":"Created the 10pm auto-lock.","automation":{"alias":"Nightly Auto-Lock","triggers":[{"trigger":"time","at":"22:00:00"}],"conditions":[{"condition":"state","entity_id":"lock.front_door","state":"unlocked"}],"actions":[{"service":"lock.lock","entity_id":"lock.front_door"}]}}

The automation adapter's blueprint mode instead emits a fenced yaml block — a complete HA blueprint with blueprint:, input:, and !input wiring — rather than the concrete JSON shown above.

Why split into specialist LoRAs

The split is what makes high task accuracy attainable on small (1.7B) weights, and it is the reason the model behaves well on the evaluation surfaces below:

Contract-first, by construction. The slim-envelope schema was designed up front and baked into every training example, so each specialist learns one clean, deterministic output shape — exactly what the integration needs to parse, and what lets the router fire the right adapter.
One job per adapter. Each LoRA learns a single tightly-constrained output contract (its slim envelope) rather than a general chat distribution, so it can saturate that one task instead of trading off across many.
Intent, not execution. Execution, live-state lookup, and YAML construction are the integration's responsibility. The model never invents state or device values — it templates them — which removes a whole class of hallucination failures.
Slim envelopes are short. Single-char-keyed JSON means fewer tokens to mispredict and faster generation, and the rigid shape is trivial for the integration to parse and validate.
Shared base, hot-swapped adapters. One Q6_K base (~1.6 GB) loads once; the right ~10–40 MB adapter is activated per request, so five specialists cost roughly the memory of a single model and every request is served by the adapter trained for it.
Multi-state entity context (v0.4.8). Per-entity attribute tails in the AVAILABLE ENTITIES block give each specialist richer single-turn grounding.

Results

Evaluated on the open allenporter/home-assistant-datasets suite — the basis of the Home Assistant LLM leaderboard (v0.4.8, temperature 0):

Surface	Score	Pass / total	95% CI	Scored on
assist	86.2%	399 / 463	±3.1	raw model + integration
assist-mini	90.3%	177 / 196	±4.1	raw model + integration
questions	94.3%	349 / 370	±2.4	raw model + integration
automations	66.7%	40 / 60	±11.9	raw model

See Evaluation for what each surface measures — the LoRA adapter vs the integration vs the raw model.

Quick start

You have a choice in how you start with Selora AI:

Ready to deploy with Home Assistant? Use llama-server — the runtime the HA integration is built around.
Want to evaluate the model first? Use Ollama — try each specialist on your machine, smoke-test the LoRAs on your hardware, decide if Selora AI is right for you before committing to the full Home Assistant integration.
Serving in the cloud? Use vLLM.

llama-server (Home Assistant integration runtime)

The reference runtime — what the model was trained against and what the Home Assistant integration uses. llama-server's /lora-adapters endpoint is the in-process LoRA hot-swap that lets the integration pick a specialist per turn without reloading the base.

Download the base and all five LoRA files into a single directory, then:

llama-server \
  --model qwen3_17b_base.Q6_K.gguf \
  --lora-init-without-apply \
  --lora selora-command.f16.gguf \
  --lora selora-automation.f16.gguf \
  --lora selora-answer.f16.gguf \
  --lora selora-clarification.f16.gguf \
  --lora selora-utilities.f16.gguf \
  --ctx-size 8192

POST to /lora-adapters to switch the active LoRA before each /v1/chat/completions call. Build instructions for llama-server are in the llama.cpp build guide.

Ollama (evaluate the model before integrating)

Ollama lets you try Selora AI on your machine and validate the LoRAs work before setting up the full Home Assistant integration. Useful for kicking the tyres on each specialist, smoke-testing the model on your hardware, or driving it from a script.

Selora requires Ollama 0.30 or later (for LoRA inference) installed locally. Pick whichever fits your machine:

macOS / Linux / Windows: official installer (single download per platform)
macOS via Homebrew: brew install ollama
Linux via shell: curl -fsSL https://ollama.com/install.sh | sh
Windows via Winget: winget install Ollama.Ollama

Download the base, the LoRAs, and the Modelfiles from this repo into one directory, then from that directory:

ollama create selora-qwen-command       -f Modelfile.command
ollama create selora-qwen-automation    -f Modelfile.automation
ollama create selora-qwen-answer        -f Modelfile.answer
ollama create selora-qwen-clarification -f Modelfile.clarification
ollama create selora-qwen-utilities     -f Modelfile.utilities

Each Modelfile pins the per-specialist system prompt and generation parameters, so no extra configuration is needed. The Q6_K base is stored once in Ollama's blob store and shared across all the specialists; only the ~10–40 MB LoRA adapter is added per slot — but ollama list will show one named entry per specialist.

Ollama 0.30+ does not support in-process LoRA hot-swap, so each specialist runs as its own named model. This path is best for direct chat or scripting use; for the Home Assistant integration use llama-server above.

vLLM (cloud)

python -m vllm.entrypoints.openai.api_server \
  --model ./qwen3_17b_hf \
  --enable-lora --max-loras 5 --max-lora-rank 32 \
  --lora-modules \
    selora-command=/path/to/peft/command \
    selora-automation=/path/to/peft/automation \
    selora-answer=/path/to/peft/answer \
    selora-clarification=/path/to/peft/clarification \
    selora-utilities=/path/to/peft/utilities

vLLM activates the matching LoRA based on the request's model field; no extra routing layer needed.

Getting started in Home Assistant

A walk-through from zero to "Selora AI is answering me in Home Assistant." If you already have HA running and just want to plug in the model, skip to step 4.

1. Create a Selora Homes Connect account

Cloud-side OAuth flows (needed by integrations that require external authentication — e.g. some appliance providers)
Optional remote-access tunnels so you can reach your home from outside the LAN
Configuration sync between multiple HA installs in the same household

The local model runs without an account — Connect is for cloud-bridged features and remote access. If you only want offline-only local AI, you can skip this step and revisit later.

2. Set up Home Assistant

Install HA on a Pi, NUC, NAS, or x86 server using the official installation guide. HA OS is the recommended path for new users; Docker is fine for power users.

Confirm you can reach the HA web UI at http://homeassistant.local:8123 before continuing.

3. Install the Selora AI integration

The custom component lives at github.com/SeloraHomes/ha-selora-ai. Two install paths:

Via HACS (recommended). HACS — the Home Assistant Community Store — handles updates automatically.

Install HACS itself if you don't have it: HACS install guide
In HA: HACS → Integrations → ⋮ → Custom repositories
Add https://github.com/SeloraHomes/ha-selora-ai as type Integration
Search for Selora AI, click Install, restart Home Assistant

Manual install. Clone directly into HA's custom_components folder:

cd /config/custom_components
git clone https://github.com/SeloraHomes/ha-selora-ai.git selora_ai
# Restart Home Assistant

4. Download the model files

From this HuggingFace repo, get:

qwen3_17b_base.Q6_K.gguf (the shared base, ~1.6 GB)
selora-command.f16.gguf
selora-automation.f16.gguf
selora-answer.f16.gguf
selora-clarification.f16.gguf
selora-utilities.f16.gguf
The Modelfile.* files (for Ollama users; skip for llama-server users)

Put them all in a single directory on the machine that'll run the model. Many users put this on the same box as HA; others run it on a dedicated GPU machine and point HA at it over the LAN.

5. Run the model locally

Pick one runtime — both are covered in the Quick start section above:

Ollama 0.30+ — simpler if you already use Ollama. One model per specialist; the HA integration treats each as a separate provider.
llama-server — the reference runtime, full LoRA hot-swap support. Best for the HA integration because it lets the integration pick the right specialist per turn.

Either way, the model needs to be reachable from wherever HA is running. Confirm with curl http://<host>:8080/v1/models (llama-server) or ollama list (Ollama).

6. Connect HA to Selora AI Local

In Home Assistant: Settings → Devices & Services → Add Integration → Selora AI. From the provider dropdown, pick Selora AI Local.

The integration auto-discovers a running llama-server (or Ollama) on the standard ports. If discovery fails, enter the host manually in the config flow.

7. Verify it works

Type one of these into the Selora AI chat panel that appears after setup:

turn on the kitchen light — should flip a light
what lights are on? — should list them
create an automation that turns on the porch light at sunset — should produce an automation card
turn on a light — should ask which one (if you have several)
why is the living-room sensor unavailable? — should give docs-grounded troubleshooting steps

If they all work, you're done. If any fail, see Troubleshooting at the bottom of this page.

What's new in v0.4.8

New `utilities` specialist (slot 4)

A fifth specialist handles docs-grounded help — maintenance (pending updates, version conflicts), troubleshooting (why a device is unavailable / offline / not responding), and setup help (how to add or configure an integration). It is given the user question, the AVAILABLE ENTITIES list, and a RELEVANT DOCS list of retrieved Home Assistant documentation chunks, and returns a compact envelope:

{"r":"<advice with {entity_id} placeholders>","q":["<entity_id>",…],"src":["<doc_chunk_id>",…]}

r is advice grounded in the retrieved docs with {entity_id} placeholders for any live-state references; q lists the entities whose live state the advice depends on (so the consumer substitutes current values); src cites the doc chunks the advice is drawn from. The fix is grounded in the docs, never in entity state alone — the state is the signal, the docs supply the explanation.

Slot order

The specialist slot contract is now 0=command, 1=automation, 2=answer, 3=clarification, 4=utilities. Backends that hot-swap LoRAs (llama-server's /lora-adapters, vLLM --enable-lora) load all five at startup.

Entity-block format reconciled with the integration

format_entities_block in scripts/gen_utils.py emits the exact per-line shape produced by _format_entity_line in custom_components/selora_ai/llm_client/sanitize.py:

AVAILABLE ENTITIES:
  - entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights
  - entity_id=sensor.sun; state=below_horizon; friendly_name=Sun

This keeps train-vs-inference entity-context blocks in lock-step so the model stays in-distribution.

Inference

The manifest carries runtime.cache_prompt = true so the hub starts llama-server with system-prompt KV caching enabled, amortizing the per-specialist system prompt across requests.

Generation parameters

{
  "temperature": 0.0,
  "repeat_penalty": 1.0,
  "repeat_last_n": 256,
  "max_tokens": 384,
  "stop": ["<|im_end|>", "<|endoftext|>"]
}

Bump max_tokens to 1536 for automation requests (longer JSON output).

Training

Base: Qwen3 1.7B fine-tuned with Apple mlx-lm. Each specialist has its own LoRA (rank 8–24, scale 20) trained on a curated HA-domain corpus (forum threads, HA docs, synthetic command / automation pairs). System prompts trained per-specialist; see prompts/. The answer adapter is trained to emit the slim {r,q} shape — a q list of entities to look up alongside its {entity_id}-templated response — which the integration resolves against live state; see prompts/answer_system_prompt.txt and the Modelfile.answer SYSTEM block. The utilities adapter is trained on docs-grounded maintenance / troubleshooting / setup examples with retrieved-doc context and citation provenance.

Training examples

Every specialist trains on the same wrapper — its system prompt, then a user turn carrying USER REQUEST and the AVAILABLE ENTITIES block (plus RELEVANT DOCS for utilities) — with the target slim envelope as the completion. One real pair per specialist (entity / doc lists trimmed; requests and target envelopes verbatim):

# command
USER REQUEST: resume the living room TV
AVAILABLE ENTITIES:
  - entity_id=media_player.living_room_tv; state=paused; friendly_name="living room TV"
→ {"c":[{"s":"media_player.media_play","e":"media_player.living_room_tv"}],"r":"Playing on living room TV."}

# answer
USER REQUEST: are hallway light and coffee maker on
AVAILABLE ENTITIES:
  - entity_id=light.hallway; state=off; friendly_name="hallway light"
  - entity_id=switch.coffee_maker; state=on; friendly_name="coffee maker"
→ {"q":["light.hallway","switch.coffee_maker"],"r":"Hallway light: {light.hallway}, coffee maker: {switch.coffee_maker}."}

# clarification
USER REQUEST: can you do something
AVAILABLE ENTITIES:
  - entity_id=light.entry; state=off; friendly_name="entry light"
  - entity_id=media_player.living_room_tv; state=idle; friendly_name="living room TV"
→ {"q":"Could you tell me what you'd like to do?"}

# automation
USER REQUEST: activate movie night every day at 7pm
AVAILABLE ENTITIES:
  - entity_id=scene.movie_night; friendly_name="movie night"
→ {"intent":"automation","response":"At **7pm** every day, this activates **movie night**. Want me to skip weekends?","description":"Activate scene scene.movie_night at 7pm.","automation":{"alias":"Auto-Movie Night","description":"Activate movie night at 7pm daily.","triggers":[{"trigger":"time","at":"19:00:00"}],"conditions":[],"actions":[{"service":"scene.turn_on","target":{"entity_id":"scene.movie_night"},"data":{}}]}}

# utilities
USER REQUEST: guide me through adding HACS
RELEVANT DOCS:
  - [hacs] Setting up the HACS integration (https://hacs.xyz/docs/use/configuration/basic/) — "Follow these steps to set up the HACS integration and authenticate it with GitHub…"
AVAILABLE ENTITIES:
  - entity_id=light.entry; state=off; friendly_name="entry light"
→ {"r":"To add HACS, go to Settings > Devices & Services > Add Integration and search for it, then follow the config flow. The integration's documentation page covers the prerequisites and step-by-step setup.","src":["hacs:https://hacs.xyz/docs/use/configuration/basic/#2","hacs:https://hacs.xyz/docs/use/configuration/options/#2"]}

Evaluation

Selora AI is evaluated on the open allenporter/home-assistant-datasets suite — the harness behind the Home Assistant LLM leaderboard, so its results are comparable to other models' conversation agents. The headline numbers are in Results at the top of the card; what matters here is what each surface actually grades — each scores a different slice of the stack:

The LoRA adapter — each specialist is a small LoRA trained on the slim-envelope schema: one strict, compact output contract per intent (the service calls to run, the entities to read, the question to ask, or the blueprint to build), designed up front and baked into every training example. Learning that single fixed shape per intent is what lets the adapter emit it reliably — and a malformed envelope is a miss on every surface, so this training is what every number ultimately grades.
The integration is in the loop for assist, assist-mini, and questions. The adapter emits the envelope; _convert_slim_shape runs the command calls as real Home Assistant service calls and resolves the answer entity list into live-state answers; the metric then grades the resulting device state / produced answer. So those three are end-to-end scores — the adapter plus the integration's parser and routing, exactly how a user experiences it.
The raw model is graded directly for automations. The automation adapter's blueprint YAML is the scored artifact, so the integration is bypassed — routing it through would wrap the YAML in a prose summary, the wrong surface for authoring. The 66.7% is the bare adapter; the open misses are light_on_door, where it emits a 2 * 60 duration the Home Assistant loader rejects.

Files in this bundle

Artifact	Purpose	Distribution
`qwen3_17b_base.Q6_K.gguf`	Q6_K base for Ollama / llama.cpp	Hugging Face, ollama.com
`selora-{intent}.f16.gguf` (×5)	Specialist LoRA adapters	Hugging Face, ollama.com
`Modelfile.{intent}`	Ollama recipes (base + LoRA + system prompt)	this repo, ollama.com
`prompts/{intent}_system_prompt.txt`	Plain-text trained prompts (reference / testing)	this repo

The full-precision (f16) base and HF safetensors set used by vLLM / TGI / SageMaker live separately in the cloud bundle and are not yet mirrored to Hugging Face.

First-run verification

Five prompts — one per specialist — let you confirm every slot loaded cleanly. Type them into HA's Selora AI panel (or hit the selora_ai/chat_stream WebSocket directly):

Prompt	Specialist	Expected behaviour
`turn on the kitchen light`	command	Light flips on; response: `"Kitchen light on."`
`what lights are on?`	answer	List of currently-on lights with `[[entities:...]]` markers
`create an automation that turns on the porch light at sunset`	automation	Automation card with `trigger: sun, event: sunset` and the porch light target
`turn on a light` (with multiple lights present)	clarification	Asks which one and offers options
`why is the living-room sensor unavailable?`	utilities	Docs-grounded troubleshooting steps citing source docs

A clean run on all five = LoRAs loaded, classifier routing correctly, and the v0.4.8 training format reaching the model. If any prompt returns garbage or empty output, check Troubleshooting below.

Troubleshooting

Symptom	Likely cause	Fix
`Selora AI Local` not in provider dropdown	Probe couldn't reach any host candidate	Verify `curl http://localhost:8080/v1/models` works on the HA host. Add the host manually in config flow if HA can't reach localhost (common on HA OS)
Chat returns empty / repeats one token	`repeat_penalty != 1.0` somewhere	Confirm llama-server is started without an override, or that the Modelfile's `PARAMETER repeat_penalty 1.0` line wasn't edited out
Wrong specialist responds (e.g. answer for a command)	Hot-swap call hasn't fired	Check HA logs for `Activating LoRA slot N`; if absent, the integration didn't classify the prompt as that intent — file an issue with the prompt text
Model invents entity_ids that don't exist	AVAILABLE ENTITIES block not being sent	The integration sends this automatically; if you're hitting the model directly, mirror the integration's `_format_entity_line` output exactly (see "Entity-block format reconciled with the integration" above)
`ollama run` works but HA can't reach it	Ollama default `localhost:11434`, llama-server `0.0.0.0:8080` — different ports	Either point the integration at port `11434` (Ollama path) or run llama-server explicitly. The integration probes `:8080` first
Pipeline hangs for 30s on automation prompts	Pre-v0.4.7 build of the integration	Update the integration to current `main`

For deeper issues, the integration's debug log (logger: custom_components.selora_ai: debug in configuration.yaml) prints the full classifier decision, the request payload sent to llama-server, and the raw model response — enough to diagnose any reproducible case.

Citation

@misc{selora-ai-2026,
  title  = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
  author = {{Selora Homes}},
  year   = {2026},
  url    = {https://huggingface.co/selorahomes/Selora-AI}
}

License

Apache-2.0

Downloads last month: 1,604

GGUF

Model size

2B params

Architecture

qwen3

Hardware compatibility

6-bit

16-bit

Model tree for selorahomes/Selora-AI

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(545)

this model