Instructions to use selorahomes/Selora-AI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use selorahomes/Selora-AI with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="selorahomes/Selora-AI") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("selorahomes/Selora-AI", dtype="auto") - llama-cpp-python
How to use selorahomes/Selora-AI with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="selorahomes/Selora-AI", filename="qwen3_17b_base.Q6_K.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use selorahomes/Selora-AI with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: llama cli -hf selorahomes/Selora-AI:Q6_K
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: llama cli -hf selorahomes/Selora-AI:Q6_K
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: ./llama-cli -hf selorahomes/Selora-AI:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf selorahomes/Selora-AI:Q6_K # Run inference directly in the terminal: ./build/bin/llama-cli -hf selorahomes/Selora-AI:Q6_K
Use Docker
docker model run hf.co/selorahomes/Selora-AI:Q6_K
- LM Studio
- Jan
- vLLM
How to use selorahomes/Selora-AI with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "selorahomes/Selora-AI" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/selorahomes/Selora-AI:Q6_K
- SGLang
How to use selorahomes/Selora-AI with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "selorahomes/Selora-AI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "selorahomes/Selora-AI" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "selorahomes/Selora-AI", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use selorahomes/Selora-AI with Ollama:
ollama run hf.co/selorahomes/Selora-AI:Q6_K
- Unsloth Studio
How to use selorahomes/Selora-AI with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for selorahomes/Selora-AI to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for selorahomes/Selora-AI to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for selorahomes/Selora-AI to start chatting
- Pi
How to use selorahomes/Selora-AI with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf selorahomes/Selora-AI:Q6_K
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "selorahomes/Selora-AI:Q6_K" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use selorahomes/Selora-AI with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf selorahomes/Selora-AI:Q6_K
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default selorahomes/Selora-AI:Q6_K
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use selorahomes/Selora-AI with Docker Model Runner:
docker model run hf.co/selorahomes/Selora-AI:Q6_K
- Lemonade
How to use selorahomes/Selora-AI with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull selorahomes/Selora-AI:Q6_K
Run and chat with the model
lemonade run user.Selora-AI-Q6_K
List all available models
lemonade list
Selora Homes: selorahomes.com Selora AI Home Assistant Integration: github.com/SeloraHomes/ha-selora-ai
Selora AI
Selora AI is an instruction-tuned language model for Home Assistant, the open-source smart home platform. Rather than one general model, it is five small, single-purpose LoRA specialists. Each emits one strict, compact JSON "slim envelope" for its intent, and the Selora AI integration executes that envelope against your home โ running service calls, resolving live state, or saving an automation (see Specialists for the exact shapes). The five specialists:
- command โ control devices ("turn off the kitchen lights")
- automation โ author home automations and blueprints
- answer โ answer questions about the home's live state
- clarification โ ask a follow-up when a request is ambiguous
- utilities โ docs-grounded maintenance, troubleshooting, and setup help
The five specialists are LoRA adapters fine-tuned on a shared Qwen3 1.7B base.
Base: Qwen3 1.7B ยท Format: GGUF Q6_K base + 5 per-specialist LoRA adapters (F16) ยท License: Apache-2.0
Selora AI powers the Selora AI Home Assistant integration and runs locally on Apple Silicon, Linux, or Windows via llama-server or Ollama, or in the cloud via vLLM โ built for self-hosted IoT deployments that stay private and offline-first.
Use cases
- Chat control of smart-home devices โ "turn off the kitchen lights", "set the thermostat to 68", "open the garage door" โ resolved against live Home Assistant entity state.
- Natural-language home automation creation โ describe an automation in plain English ("when the front door opens after 10pm, turn on the porch light") and Selora returns valid Home Assistant YAML as a draft you review before it's saved.
- Scene and routine orchestration โ chain actions across multiple entities ("good night" โ lock doors, dim bedroom lights, set thermostat) without hand-writing scripts.
- Q&A about your home โ "is the laundry running?", "what's the temperature
upstairs?" โ the
answeradapter returns the entities to check plus a response template, and the integration fills in the live state. - Docs-grounded help โ "why is the living-room sensor unavailable?", "how do I add the Hue integration?", "which integrations have pending updates?" โ answered from retrieved Home Assistant documentation, with live state pulled in via entity placeholders and citations back to the source docs.
- Privacy-first home assistant โ runs entirely on local hardware (Raspberry Pi 5, Mac mini, NUC-class boxes) with no cloud dependency, so device commands and home telemetry never leave the LAN.
Specialists
Every specialist LoRA emits a compact "slim envelope" โ a small JSON object whose
high-frequency keys are single characters to save tokens โ and that envelope is an
intermediate representation, not a finished action. The Selora AI integration's
_convert_slim_shape parser turns it into real Home Assistant activity: run a
downloaded GGUF on its own and you get the envelope back; it takes the integration (or
an equivalent parser) to execute service calls, resolve live state, build YAML, or
attach citations. By design the model never fabricates device state โ it emits the
calls to make, the entities to look up, the question to ask, or the automation to
build, and the integration completes the loop against live HA.
| Adapter | Emits (raw model output) | What the integration does with it |
|---|---|---|
command |
{"c":[{"s":"<service>","e":"<entity_id>","d":{โฆ}}],"r":"<confirmation>"} |
Executes each c as an HA service call, in array order, then shows r. The model does not call services itself. |
answer |
{"r":"<text with {entity_id} placeholders>","q":["<entity_id>",โฆ]} |
Looks up the q entities' live state and substitutes it into the {entity_id} placeholders in r. The model templates state; it never reads it. |
clarification |
{"q":"<question>","o":["<option>",โฆ]} |
Surfaces the question and the optional quick-reply o options; the user's next reply is the action. |
utilities |
{"r":"<advice with {entity_id} placeholders>","q":["<entity_id>",โฆ],"src":["<doc_chunk_id>",โฆ]} |
Substitutes live q state into the advice and surfaces the src citations. Advice is grounded in a RELEVANT DOCS block injected at inference. |
automation |
Blueprint request โ a fenced yaml HA blueprint (blueprint: / input: / !input). Concrete request โ {"intent":"automation","response":โฆ,"automation":{"alias":โฆ,"triggers":[โฆ],"conditions":[โฆ],"actions":[โฆ]}} |
Uses the blueprint YAML (nearly) as-is, or maps the concrete automation object JSONโYAML into a saved HA automation. |
Key map, applied once on the HA side: r=response, q=query / entity list, c=calls,
s=service, e=entity_id, d=data (service params), o=options.
The integration's selora_local provider classifies each request to one specialist (a
regex pre-classifier), then activates the matching adapter โ llama-server's
/lora-adapters hot-swap on the production hub, or vLLM --enable-lora.
Example envelopes
Real output for each specialist โ the raw model emission; the integration then executes or resolves it:
// command โ "Turn off the kitchen light and lock the front door"
{"c":[{"s":"light.turn_off","e":"light.kitchen"},{"s":"lock.lock","e":"lock.front_door"}],"r":"Kitchen light off and front door locked."}
// answer โ "Is the bedroom light on?"
{"q":["light.bedroom"],"r":"The bedroom light is {light.bedroom}."}
// clarification โ "Turn on a light" (when several exist)
{"q":"Which light โ kitchen, bedroom, or living room?","o":["kitchen","bedroom","living room"]}
// utilities โ "Why does the Hue bridge show an update?"
{"r":"The {update.philips_hue} update is pending; install it from Settings โ Devices & Services โ Updates.","q":["update.philips_hue"],"src":["home-assistant/updates#install"]}
// automation (concrete) โ "Lock the front door at 10pm if it's still unlocked"
{"intent":"automation","response":"Created the 10pm auto-lock.","automation":{"alias":"Nightly Auto-Lock","triggers":[{"trigger":"time","at":"22:00:00"}],"conditions":[{"condition":"state","entity_id":"lock.front_door","state":"unlocked"}],"actions":[{"service":"lock.lock","entity_id":"lock.front_door"}]}}
The automation adapter's blueprint mode instead emits a fenced yaml block โ a
complete HA blueprint with blueprint:, input:, and !input wiring โ rather than the
concrete JSON shown above.
Why split into specialist LoRAs
The split is what makes high task accuracy attainable on small (1.7B) weights, and it is the reason the model behaves well on the evaluation surfaces below:
- Contract-first, by construction. The slim-envelope schema was designed up front and baked into every training example, so each specialist learns one clean, deterministic output shape โ exactly what the integration needs to parse, and what lets the router fire the right adapter.
- One job per adapter. Each LoRA learns a single tightly-constrained output contract (its slim envelope) rather than a general chat distribution, so it can saturate that one task instead of trading off across many.
- Intent, not execution. Execution, live-state lookup, and YAML construction are the integration's responsibility. The model never invents state or device values โ it templates them โ which removes a whole class of hallucination failures.
- Slim envelopes are short. Single-char-keyed JSON means fewer tokens to mispredict and faster generation, and the rigid shape is trivial for the integration to parse and validate.
- Shared base, hot-swapped adapters. One Q6_K base (~1.6 GB) loads once; the right ~10โ40 MB adapter is activated per request, so five specialists cost roughly the memory of a single model and every request is served by the adapter trained for it.
- Multi-state entity context (v0.4.8). Per-entity attribute tails in the
AVAILABLE ENTITIESblock give each specialist richer single-turn grounding.
Results
Evaluated on the open allenporter/home-assistant-datasets suite โ the basis of the Home Assistant LLM leaderboard (v0.4.8, temperature 0):
| Surface | Score | Pass / total | 95% CI | Scored on |
|---|---|---|---|---|
| assist | 86.2% | 399 / 463 | ยฑ3.1 | raw model + integration |
| assist-mini | 90.3% | 177 / 196 | ยฑ4.1 | raw model + integration |
| questions | 94.3% | 349 / 370 | ยฑ2.4 | raw model + integration |
| automations | 66.7% | 40 / 60 | ยฑ11.9 | raw model |
See Evaluation for what each surface measures โ the LoRA adapter vs the integration vs the raw model.
Quick start
You have a choice in how you start with Selora AI:
- Ready to deploy with Home Assistant? Use llama-server โ the runtime the HA integration is built around.
- Want to evaluate the model first? Use Ollama โ try each specialist on your machine, smoke-test the LoRAs on your hardware, decide if Selora AI is right for you before committing to the full Home Assistant integration.
- Serving in the cloud? Use vLLM.
llama-server (Home Assistant integration runtime)
The reference runtime โ what the model was trained against and what the Home Assistant integration uses. llama-server's /lora-adapters endpoint is the in-process LoRA hot-swap that lets the integration pick a specialist per turn without reloading the base.
Download the base and all five LoRA files into a single directory, then:
llama-server \
--model qwen3_17b_base.Q6_K.gguf \
--lora-init-without-apply \
--lora selora-command.f16.gguf \
--lora selora-automation.f16.gguf \
--lora selora-answer.f16.gguf \
--lora selora-clarification.f16.gguf \
--lora selora-utilities.f16.gguf \
--ctx-size 8192
POST to /lora-adapters to switch the active LoRA before each
/v1/chat/completions call. Build instructions for llama-server are in the llama.cpp build guide.
Ollama (evaluate the model before integrating)
Ollama lets you try Selora AI on your machine and validate the LoRAs work before setting up the full Home Assistant integration. Useful for kicking the tyres on each specialist, smoke-testing the model on your hardware, or driving it from a script.
Selora requires Ollama 0.30 or later (for LoRA inference) installed locally. Pick whichever fits your machine:
- macOS / Linux / Windows: official installer (single download per platform)
- macOS via Homebrew:
brew install ollama - Linux via shell:
curl -fsSL https://ollama.com/install.sh | sh - Windows via Winget:
winget install Ollama.Ollama
Download the base, the LoRAs, and the Modelfiles from this repo into one directory, then from that directory:
ollama create selora-qwen-command -f Modelfile.command
ollama create selora-qwen-automation -f Modelfile.automation
ollama create selora-qwen-answer -f Modelfile.answer
ollama create selora-qwen-clarification -f Modelfile.clarification
ollama create selora-qwen-utilities -f Modelfile.utilities
Each Modelfile pins the per-specialist system prompt and generation parameters,
so no extra configuration is needed. The Q6_K base is stored once in Ollama's
blob store and shared across all the specialists; only the ~10โ40 MB LoRA
adapter is added per slot โ but ollama list will show one named entry per
specialist.
Ollama 0.30+ does not support in-process LoRA hot-swap, so each specialist runs as its own named model. This path is best for direct chat or scripting use; for the Home Assistant integration use llama-server above.
vLLM (cloud)
python -m vllm.entrypoints.openai.api_server \
--model ./qwen3_17b_hf \
--enable-lora --max-loras 5 --max-lora-rank 32 \
--lora-modules \
selora-command=/path/to/peft/command \
selora-automation=/path/to/peft/automation \
selora-answer=/path/to/peft/answer \
selora-clarification=/path/to/peft/clarification \
selora-utilities=/path/to/peft/utilities
vLLM activates the matching LoRA based on the request's model field;
no extra routing layer needed.
Getting started in Home Assistant
A walk-through from zero to "Selora AI is answering me in Home Assistant." If you already have HA running and just want to plug in the model, skip to step 4.
1. Create a Selora Homes Connect account
Sign up at selorahomes.com/connect. The account ties your local install to:
- Cloud-side OAuth flows (needed by integrations that require external authentication โ e.g. some appliance providers)
- Optional remote-access tunnels so you can reach your home from outside the LAN
- Configuration sync between multiple HA installs in the same household
The local model runs without an account โ Connect is for cloud-bridged features and remote access. If you only want offline-only local AI, you can skip this step and revisit later.
2. Set up Home Assistant
Install HA on a Pi, NUC, NAS, or x86 server using the official installation guide. HA OS is the recommended path for new users; Docker is fine for power users.
Confirm you can reach the HA web UI at http://homeassistant.local:8123 before continuing.
3. Install the Selora AI integration
The custom component lives at github.com/SeloraHomes/ha-selora-ai. Two install paths:
Via HACS (recommended). HACS โ the Home Assistant Community Store โ handles updates automatically.
- Install HACS itself if you don't have it: HACS install guide
- In HA: HACS โ Integrations โ โฎ โ Custom repositories
- Add
https://github.com/SeloraHomes/ha-selora-aias type Integration - Search for Selora AI, click Install, restart Home Assistant
Manual install. Clone directly into HA's custom_components folder:
cd /config/custom_components
git clone https://github.com/SeloraHomes/ha-selora-ai.git selora_ai
# Restart Home Assistant
4. Download the model files
From this HuggingFace repo, get:
qwen3_17b_base.Q6_K.gguf(the shared base, ~1.6 GB)selora-command.f16.ggufselora-automation.f16.ggufselora-answer.f16.ggufselora-clarification.f16.ggufselora-utilities.f16.gguf- The
Modelfile.*files (for Ollama users; skip forllama-serverusers)
Put them all in a single directory on the machine that'll run the model. Many users put this on the same box as HA; others run it on a dedicated GPU machine and point HA at it over the LAN.
5. Run the model locally
Pick one runtime โ both are covered in the Quick start section above:
- Ollama 0.30+ โ simpler if you already use Ollama. One model per specialist; the HA integration treats each as a separate provider.
llama-serverโ the reference runtime, full LoRA hot-swap support. Best for the HA integration because it lets the integration pick the right specialist per turn.
Either way, the model needs to be reachable from wherever HA is running. Confirm with curl http://<host>:8080/v1/models (llama-server) or ollama list (Ollama).
6. Connect HA to Selora AI Local
In Home Assistant: Settings โ Devices & Services โ Add Integration โ Selora AI. From the provider dropdown, pick Selora AI Local.
The integration auto-discovers a running llama-server (or Ollama) on the standard ports. If discovery fails, enter the host manually in the config flow.
7. Verify it works
Type one of these into the Selora AI chat panel that appears after setup:
turn on the kitchen lightโ should flip a lightwhat lights are on?โ should list themcreate an automation that turns on the porch light at sunsetโ should produce an automation cardturn on a lightโ should ask which one (if you have several)why is the living-room sensor unavailable?โ should give docs-grounded troubleshooting steps
If they all work, you're done. If any fail, see Troubleshooting at the bottom of this page.
What's new in v0.4.8
New utilities specialist (slot 4)
A fifth specialist handles docs-grounded help โ maintenance (pending updates,
version conflicts), troubleshooting (why a device is unavailable / offline /
not responding), and setup help (how to add or configure an integration). It is
given the user question, the AVAILABLE ENTITIES list, and a RELEVANT DOCS
list of retrieved Home Assistant documentation chunks, and returns a compact
envelope:
{"r":"<advice with {entity_id} placeholders>","q":["<entity_id>",โฆ],"src":["<doc_chunk_id>",โฆ]}
r is advice grounded in the retrieved docs with {entity_id} placeholders for
any live-state references; q lists the entities whose live state the advice
depends on (so the consumer substitutes current values); src cites the doc
chunks the advice is drawn from. The fix is grounded in the docs, never in entity
state alone โ the state is the signal, the docs supply the explanation.
Slot order
The specialist slot contract is now 0=command, 1=automation, 2=answer, 3=clarification, 4=utilities. Backends that hot-swap LoRAs (llama-server's
/lora-adapters, vLLM --enable-lora) load all five at startup.
Entity-block format reconciled with the integration
format_entities_block in scripts/gen_utils.py emits the exact per-line shape produced by _format_entity_line in custom_components/selora_ai/llm_client/sanitize.py:
AVAILABLE ENTITIES:
- entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights
- entity_id=sensor.sun; state=below_horizon; friendly_name=Sun
This keeps train-vs-inference entity-context blocks in lock-step so the model stays in-distribution.
Inference
The manifest carries runtime.cache_prompt = true so the hub starts
llama-server with system-prompt KV caching enabled, amortizing the
per-specialist system prompt across requests.
Generation parameters
{
"temperature": 0.0,
"repeat_penalty": 1.0,
"repeat_last_n": 256,
"max_tokens": 384,
"stop": ["<|im_end|>", "<|endoftext|>"]
}
Bump max_tokens to 1536 for automation requests (longer JSON output).
Training
Base: Qwen3 1.7B fine-tuned
with Apple mlx-lm. Each
specialist has its own LoRA (rank 8โ24, scale 20) trained on a curated
HA-domain corpus (forum threads, HA docs, synthetic command /
automation pairs). System prompts trained per-specialist; see
prompts/. The answer adapter is trained to emit the slim
{r,q} shape โ a q list of entities to look up alongside its
{entity_id}-templated response โ which the integration resolves against
live state; see prompts/answer_system_prompt.txt and the
Modelfile.answer SYSTEM block. The utilities adapter is trained on
docs-grounded maintenance / troubleshooting / setup examples with
retrieved-doc context and citation provenance.
Training examples
Every specialist trains on the same wrapper โ its system prompt, then a user turn carrying
USER REQUEST and the AVAILABLE ENTITIES block (plus RELEVANT DOCS for utilities) โ
with the target slim envelope as the completion. One real pair per specialist (entity / doc
lists trimmed; requests and target envelopes verbatim):
# command
USER REQUEST: resume the living room TV
AVAILABLE ENTITIES:
- entity_id=media_player.living_room_tv; state=paused; friendly_name="living room TV"
โ {"c":[{"s":"media_player.media_play","e":"media_player.living_room_tv"}],"r":"Playing on living room TV."}
# answer
USER REQUEST: are hallway light and coffee maker on
AVAILABLE ENTITIES:
- entity_id=light.hallway; state=off; friendly_name="hallway light"
- entity_id=switch.coffee_maker; state=on; friendly_name="coffee maker"
โ {"q":["light.hallway","switch.coffee_maker"],"r":"Hallway light: {light.hallway}, coffee maker: {switch.coffee_maker}."}
# clarification
USER REQUEST: can you do something
AVAILABLE ENTITIES:
- entity_id=light.entry; state=off; friendly_name="entry light"
- entity_id=media_player.living_room_tv; state=idle; friendly_name="living room TV"
โ {"q":"Could you tell me what you'd like to do?"}
# automation
USER REQUEST: activate movie night every day at 7pm
AVAILABLE ENTITIES:
- entity_id=scene.movie_night; friendly_name="movie night"
โ {"intent":"automation","response":"At **7pm** every day, this activates **movie night**. Want me to skip weekends?","description":"Activate scene scene.movie_night at 7pm.","automation":{"alias":"Auto-Movie Night","description":"Activate movie night at 7pm daily.","triggers":[{"trigger":"time","at":"19:00:00"}],"conditions":[],"actions":[{"service":"scene.turn_on","target":{"entity_id":"scene.movie_night"},"data":{}}]}}
# utilities
USER REQUEST: guide me through adding HACS
RELEVANT DOCS:
- [hacs] Setting up the HACS integration (https://hacs.xyz/docs/use/configuration/basic/) โ "Follow these steps to set up the HACS integration and authenticate it with GitHubโฆ"
AVAILABLE ENTITIES:
- entity_id=light.entry; state=off; friendly_name="entry light"
โ {"r":"To add HACS, go to Settings > Devices & Services > Add Integration and search for it, then follow the config flow. The integration's documentation page covers the prerequisites and step-by-step setup.","src":["hacs:https://hacs.xyz/docs/use/configuration/basic/#2","hacs:https://hacs.xyz/docs/use/configuration/options/#2"]}
Evaluation
Selora AI is evaluated on the open allenporter/home-assistant-datasets suite โ the harness behind the Home Assistant LLM leaderboard, so its results are comparable to other models' conversation agents. The headline numbers are in Results at the top of the card; what matters here is what each surface actually grades โ each scores a different slice of the stack:
- The LoRA adapter โ each specialist is a small LoRA trained on the slim-envelope schema: one strict, compact output contract per intent (the service calls to run, the entities to read, the question to ask, or the blueprint to build), designed up front and baked into every training example. Learning that single fixed shape per intent is what lets the adapter emit it reliably โ and a malformed envelope is a miss on every surface, so this training is what every number ultimately grades.
- The integration is in the loop for assist, assist-mini, and questions.
The adapter emits the envelope;
_convert_slim_shaperuns thecommandcalls as real Home Assistant service calls and resolves theanswerentity list into live-state answers; the metric then grades the resulting device state / produced answer. So those three are end-to-end scores โ the adapter plus the integration's parser and routing, exactly how a user experiences it. - The raw model is graded directly for automations. The
automationadapter's blueprint YAML is the scored artifact, so the integration is bypassed โ routing it through would wrap the YAML in a prose summary, the wrong surface for authoring. The 66.7% is the bare adapter; the open misses arelight_on_door, where it emits a2 * 60duration the Home Assistant loader rejects.
Files in this bundle
| Artifact | Purpose | Distribution |
|---|---|---|
qwen3_17b_base.Q6_K.gguf |
Q6_K base for Ollama / llama.cpp | Hugging Face, ollama.com |
selora-{intent}.f16.gguf (ร5) |
Specialist LoRA adapters | Hugging Face, ollama.com |
Modelfile.{intent} |
Ollama recipes (base + LoRA + system prompt) | this repo, ollama.com |
prompts/{intent}_system_prompt.txt |
Plain-text trained prompts (reference / testing) | this repo |
The full-precision (f16) base and HF safetensors set used by vLLM / TGI / SageMaker live separately in the cloud bundle and are not yet mirrored to Hugging Face.
First-run verification
Five prompts โ one per specialist โ let you confirm every slot loaded cleanly. Type them into HA's Selora AI panel (or hit the selora_ai/chat_stream WebSocket directly):
| Prompt | Specialist | Expected behaviour |
|---|---|---|
turn on the kitchen light |
command | Light flips on; response: "Kitchen light on." |
what lights are on? |
answer | List of currently-on lights with [[entities:...]] markers |
create an automation that turns on the porch light at sunset |
automation | Automation card with trigger: sun, event: sunset and the porch light target |
turn on a light (with multiple lights present) |
clarification | Asks which one and offers options |
why is the living-room sensor unavailable? |
utilities | Docs-grounded troubleshooting steps citing source docs |
A clean run on all five = LoRAs loaded, classifier routing correctly, and the v0.4.8 training format reaching the model. If any prompt returns garbage or empty output, check Troubleshooting below.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
Selora AI Local not in provider dropdown |
Probe couldn't reach any host candidate | Verify curl http://localhost:8080/v1/models works on the HA host. Add the host manually in config flow if HA can't reach localhost (common on HA OS) |
| Chat returns empty / repeats one token | repeat_penalty != 1.0 somewhere |
Confirm llama-server is started without an override, or that the Modelfile's PARAMETER repeat_penalty 1.0 line wasn't edited out |
| Wrong specialist responds (e.g. answer for a command) | Hot-swap call hasn't fired | Check HA logs for Activating LoRA slot N; if absent, the integration didn't classify the prompt as that intent โ file an issue with the prompt text |
| Model invents entity_ids that don't exist | AVAILABLE ENTITIES block not being sent | The integration sends this automatically; if you're hitting the model directly, mirror the integration's _format_entity_line output exactly (see "Entity-block format reconciled with the integration" above) |
ollama run works but HA can't reach it |
Ollama default localhost:11434, llama-server 0.0.0.0:8080 โ different ports |
Either point the integration at port 11434 (Ollama path) or run llama-server explicitly. The integration probes :8080 first |
| Pipeline hangs for 30s on automation prompts | Pre-v0.4.7 build of the integration | Update the integration to current main |
For deeper issues, the integration's debug log (logger: custom_components.selora_ai: debug in configuration.yaml) prints the full classifier decision, the request payload sent to llama-server, and the raw model response โ enough to diagnose any reproducible case.
Citation
@misc{selora-ai-2026,
title = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
author = {{Selora Homes}},
year = {2026},
url = {https://huggingface.co/selorahomes/Selora-AI}
}
License
Apache-2.0
- Downloads last month
- 1,604
6-bit
16-bit