Selora Homes: selorahomes.com Selora AI Home Assistant Integration: github.com/SeloraHomes/ha-selora-ai


Selora AI

Selora AI is an instruction-tuned language model for Home Assistant, the open-source smart home platform. Rather than one general model, it is five small, single-purpose LoRA specialists. Each emits one strict, compact JSON "slim envelope" for its intent, and the Selora AI integration executes that envelope against your home โ€” running service calls, resolving live state, or saving an automation (see Specialists for the exact shapes). The five specialists:

  • command โ€” control devices ("turn off the kitchen lights")
  • automation โ€” author home automations and blueprints
  • answer โ€” answer questions about the home's live state
  • clarification โ€” ask a follow-up when a request is ambiguous
  • utilities โ€” docs-grounded maintenance, troubleshooting, and setup help

The five specialists are LoRA adapters fine-tuned on a shared Qwen3 1.7B base.

Base: Qwen3 1.7B ยท Format: GGUF Q6_K base + 5 per-specialist LoRA adapters (F16) ยท License: Apache-2.0

Selora AI powers the Selora AI Home Assistant integration and runs locally on Apple Silicon, Linux, or Windows via llama-server or Ollama, or in the cloud via vLLM โ€” built for self-hosted IoT deployments that stay private and offline-first.

Use cases

  • Chat control of smart-home devices โ€” "turn off the kitchen lights", "set the thermostat to 68", "open the garage door" โ€” resolved against live Home Assistant entity state.
  • Natural-language home automation creation โ€” describe an automation in plain English ("when the front door opens after 10pm, turn on the porch light") and Selora returns valid Home Assistant YAML as a draft you review before it's saved.
  • Scene and routine orchestration โ€” chain actions across multiple entities ("good night" โ†’ lock doors, dim bedroom lights, set thermostat) without hand-writing scripts.
  • Q&A about your home โ€” "is the laundry running?", "what's the temperature upstairs?" โ€” the answer adapter returns the entities to check plus a response template, and the integration fills in the live state.
  • Docs-grounded help โ€” "why is the living-room sensor unavailable?", "how do I add the Hue integration?", "which integrations have pending updates?" โ€” answered from retrieved Home Assistant documentation, with live state pulled in via entity placeholders and citations back to the source docs.
  • Privacy-first home assistant โ€” runs entirely on local hardware (Raspberry Pi 5, Mac mini, NUC-class boxes) with no cloud dependency, so device commands and home telemetry never leave the LAN.

Specialists

Every specialist LoRA emits a compact "slim envelope" โ€” a small JSON object whose high-frequency keys are single characters to save tokens โ€” and that envelope is an intermediate representation, not a finished action. The Selora AI integration's _convert_slim_shape parser turns it into real Home Assistant activity: run a downloaded GGUF on its own and you get the envelope back; it takes the integration (or an equivalent parser) to execute service calls, resolve live state, build YAML, or attach citations. By design the model never fabricates device state โ€” it emits the calls to make, the entities to look up, the question to ask, or the automation to build, and the integration completes the loop against live HA.

Adapter Emits (raw model output) What the integration does with it
command {"c":[{"s":"<service>","e":"<entity_id>","d":{โ€ฆ}}],"r":"<confirmation>"} Executes each c as an HA service call, in array order, then shows r. The model does not call services itself.
answer {"r":"<text with {entity_id} placeholders>","q":["<entity_id>",โ€ฆ]} Looks up the q entities' live state and substitutes it into the {entity_id} placeholders in r. The model templates state; it never reads it.
clarification {"q":"<question>","o":["<option>",โ€ฆ]} Surfaces the question and the optional quick-reply o options; the user's next reply is the action.
utilities {"r":"<advice with {entity_id} placeholders>","q":["<entity_id>",โ€ฆ],"src":["<doc_chunk_id>",โ€ฆ]} Substitutes live q state into the advice and surfaces the src citations. Advice is grounded in a RELEVANT DOCS block injected at inference.
automation Blueprint request โ†’ a fenced yaml HA blueprint (blueprint: / input: / !input). Concrete request โ†’ {"intent":"automation","response":โ€ฆ,"automation":{"alias":โ€ฆ,"triggers":[โ€ฆ],"conditions":[โ€ฆ],"actions":[โ€ฆ]}} Uses the blueprint YAML (nearly) as-is, or maps the concrete automation object JSONโ†’YAML into a saved HA automation.

Key map, applied once on the HA side: r=response, q=query / entity list, c=calls, s=service, e=entity_id, d=data (service params), o=options.

The integration's selora_local provider classifies each request to one specialist (a regex pre-classifier), then activates the matching adapter โ€” llama-server's /lora-adapters hot-swap on the production hub, or vLLM --enable-lora.

Example envelopes

Real output for each specialist โ€” the raw model emission; the integration then executes or resolves it:

// command โ€” "Turn off the kitchen light and lock the front door"
{"c":[{"s":"light.turn_off","e":"light.kitchen"},{"s":"lock.lock","e":"lock.front_door"}],"r":"Kitchen light off and front door locked."}

// answer โ€” "Is the bedroom light on?"
{"q":["light.bedroom"],"r":"The bedroom light is {light.bedroom}."}

// clarification โ€” "Turn on a light" (when several exist)
{"q":"Which light โ€” kitchen, bedroom, or living room?","o":["kitchen","bedroom","living room"]}

// utilities โ€” "Why does the Hue bridge show an update?"
{"r":"The {update.philips_hue} update is pending; install it from Settings โ†’ Devices & Services โ†’ Updates.","q":["update.philips_hue"],"src":["home-assistant/updates#install"]}

// automation (concrete) โ€” "Lock the front door at 10pm if it's still unlocked"
{"intent":"automation","response":"Created the 10pm auto-lock.","automation":{"alias":"Nightly Auto-Lock","triggers":[{"trigger":"time","at":"22:00:00"}],"conditions":[{"condition":"state","entity_id":"lock.front_door","state":"unlocked"}],"actions":[{"service":"lock.lock","entity_id":"lock.front_door"}]}}

The automation adapter's blueprint mode instead emits a fenced yaml block โ€” a complete HA blueprint with blueprint:, input:, and !input wiring โ€” rather than the concrete JSON shown above.

Why split into specialist LoRAs

The split is what makes high task accuracy attainable on small (1.7B) weights, and it is the reason the model behaves well on the evaluation surfaces below:

  • Contract-first, by construction. The slim-envelope schema was designed up front and baked into every training example, so each specialist learns one clean, deterministic output shape โ€” exactly what the integration needs to parse, and what lets the router fire the right adapter.
  • One job per adapter. Each LoRA learns a single tightly-constrained output contract (its slim envelope) rather than a general chat distribution, so it can saturate that one task instead of trading off across many.
  • Intent, not execution. Execution, live-state lookup, and YAML construction are the integration's responsibility. The model never invents state or device values โ€” it templates them โ€” which removes a whole class of hallucination failures.
  • Slim envelopes are short. Single-char-keyed JSON means fewer tokens to mispredict and faster generation, and the rigid shape is trivial for the integration to parse and validate.
  • Shared base, hot-swapped adapters. One Q6_K base (~1.6 GB) loads once; the right ~10โ€“40 MB adapter is activated per request, so five specialists cost roughly the memory of a single model and every request is served by the adapter trained for it.
  • Multi-state entity context (v0.4.8). Per-entity attribute tails in the AVAILABLE ENTITIES block give each specialist richer single-turn grounding.

Results

Evaluated on the open allenporter/home-assistant-datasets suite โ€” the basis of the Home Assistant LLM leaderboard (v0.4.8, temperature 0):

Surface Score Pass / total 95% CI Scored on
assist 86.2% 399 / 463 ยฑ3.1 raw model + integration
assist-mini 90.3% 177 / 196 ยฑ4.1 raw model + integration
questions 94.3% 349 / 370 ยฑ2.4 raw model + integration
automations 66.7% 40 / 60 ยฑ11.9 raw model

See Evaluation for what each surface measures โ€” the LoRA adapter vs the integration vs the raw model.

Quick start

You have a choice in how you start with Selora AI:

  • Ready to deploy with Home Assistant? Use llama-server โ€” the runtime the HA integration is built around.
  • Want to evaluate the model first? Use Ollama โ€” try each specialist on your machine, smoke-test the LoRAs on your hardware, decide if Selora AI is right for you before committing to the full Home Assistant integration.
  • Serving in the cloud? Use vLLM.

llama-server (Home Assistant integration runtime)

The reference runtime โ€” what the model was trained against and what the Home Assistant integration uses. llama-server's /lora-adapters endpoint is the in-process LoRA hot-swap that lets the integration pick a specialist per turn without reloading the base.

Download the base and all five LoRA files into a single directory, then:

llama-server \
  --model qwen3_17b_base.Q6_K.gguf \
  --lora-init-without-apply \
  --lora selora-command.f16.gguf \
  --lora selora-automation.f16.gguf \
  --lora selora-answer.f16.gguf \
  --lora selora-clarification.f16.gguf \
  --lora selora-utilities.f16.gguf \
  --ctx-size 8192

POST to /lora-adapters to switch the active LoRA before each /v1/chat/completions call. Build instructions for llama-server are in the llama.cpp build guide.

Ollama (evaluate the model before integrating)

Ollama lets you try Selora AI on your machine and validate the LoRAs work before setting up the full Home Assistant integration. Useful for kicking the tyres on each specialist, smoke-testing the model on your hardware, or driving it from a script.

Selora requires Ollama 0.30 or later (for LoRA inference) installed locally. Pick whichever fits your machine:

  • macOS / Linux / Windows: official installer (single download per platform)
  • macOS via Homebrew: brew install ollama
  • Linux via shell: curl -fsSL https://ollama.com/install.sh | sh
  • Windows via Winget: winget install Ollama.Ollama

Download the base, the LoRAs, and the Modelfiles from this repo into one directory, then from that directory:

ollama create selora-qwen-command       -f Modelfile.command
ollama create selora-qwen-automation    -f Modelfile.automation
ollama create selora-qwen-answer        -f Modelfile.answer
ollama create selora-qwen-clarification -f Modelfile.clarification
ollama create selora-qwen-utilities     -f Modelfile.utilities

Each Modelfile pins the per-specialist system prompt and generation parameters, so no extra configuration is needed. The Q6_K base is stored once in Ollama's blob store and shared across all the specialists; only the ~10โ€“40 MB LoRA adapter is added per slot โ€” but ollama list will show one named entry per specialist.

Ollama 0.30+ does not support in-process LoRA hot-swap, so each specialist runs as its own named model. This path is best for direct chat or scripting use; for the Home Assistant integration use llama-server above.

vLLM (cloud)

python -m vllm.entrypoints.openai.api_server \
  --model ./qwen3_17b_hf \
  --enable-lora --max-loras 5 --max-lora-rank 32 \
  --lora-modules \
    selora-command=/path/to/peft/command \
    selora-automation=/path/to/peft/automation \
    selora-answer=/path/to/peft/answer \
    selora-clarification=/path/to/peft/clarification \
    selora-utilities=/path/to/peft/utilities

vLLM activates the matching LoRA based on the request's model field; no extra routing layer needed.

Getting started in Home Assistant

A walk-through from zero to "Selora AI is answering me in Home Assistant." If you already have HA running and just want to plug in the model, skip to step 4.

1. Create a Selora Homes Connect account

Sign up at selorahomes.com/connect. The account ties your local install to:

  • Cloud-side OAuth flows (needed by integrations that require external authentication โ€” e.g. some appliance providers)
  • Optional remote-access tunnels so you can reach your home from outside the LAN
  • Configuration sync between multiple HA installs in the same household

The local model runs without an account โ€” Connect is for cloud-bridged features and remote access. If you only want offline-only local AI, you can skip this step and revisit later.

2. Set up Home Assistant

Install HA on a Pi, NUC, NAS, or x86 server using the official installation guide. HA OS is the recommended path for new users; Docker is fine for power users.

Confirm you can reach the HA web UI at http://homeassistant.local:8123 before continuing.

3. Install the Selora AI integration

The custom component lives at github.com/SeloraHomes/ha-selora-ai. Two install paths:

Via HACS (recommended). HACS โ€” the Home Assistant Community Store โ€” handles updates automatically.

  1. Install HACS itself if you don't have it: HACS install guide
  2. In HA: HACS โ†’ Integrations โ†’ โ‹ฎ โ†’ Custom repositories
  3. Add https://github.com/SeloraHomes/ha-selora-ai as type Integration
  4. Search for Selora AI, click Install, restart Home Assistant

Manual install. Clone directly into HA's custom_components folder:

cd /config/custom_components
git clone https://github.com/SeloraHomes/ha-selora-ai.git selora_ai
# Restart Home Assistant

4. Download the model files

From this HuggingFace repo, get:

  • qwen3_17b_base.Q6_K.gguf (the shared base, ~1.6 GB)
  • selora-command.f16.gguf
  • selora-automation.f16.gguf
  • selora-answer.f16.gguf
  • selora-clarification.f16.gguf
  • selora-utilities.f16.gguf
  • The Modelfile.* files (for Ollama users; skip for llama-server users)

Put them all in a single directory on the machine that'll run the model. Many users put this on the same box as HA; others run it on a dedicated GPU machine and point HA at it over the LAN.

5. Run the model locally

Pick one runtime โ€” both are covered in the Quick start section above:

  • Ollama 0.30+ โ€” simpler if you already use Ollama. One model per specialist; the HA integration treats each as a separate provider.
  • llama-server โ€” the reference runtime, full LoRA hot-swap support. Best for the HA integration because it lets the integration pick the right specialist per turn.

Either way, the model needs to be reachable from wherever HA is running. Confirm with curl http://<host>:8080/v1/models (llama-server) or ollama list (Ollama).

6. Connect HA to Selora AI Local

In Home Assistant: Settings โ†’ Devices & Services โ†’ Add Integration โ†’ Selora AI. From the provider dropdown, pick Selora AI Local.

The integration auto-discovers a running llama-server (or Ollama) on the standard ports. If discovery fails, enter the host manually in the config flow.

7. Verify it works

Type one of these into the Selora AI chat panel that appears after setup:

  • turn on the kitchen light โ€” should flip a light
  • what lights are on? โ€” should list them
  • create an automation that turns on the porch light at sunset โ€” should produce an automation card
  • turn on a light โ€” should ask which one (if you have several)
  • why is the living-room sensor unavailable? โ€” should give docs-grounded troubleshooting steps

If they all work, you're done. If any fail, see Troubleshooting at the bottom of this page.

What's new in v0.4.8

New utilities specialist (slot 4)

A fifth specialist handles docs-grounded help โ€” maintenance (pending updates, version conflicts), troubleshooting (why a device is unavailable / offline / not responding), and setup help (how to add or configure an integration). It is given the user question, the AVAILABLE ENTITIES list, and a RELEVANT DOCS list of retrieved Home Assistant documentation chunks, and returns a compact envelope:

{"r":"<advice with {entity_id} placeholders>","q":["<entity_id>",โ€ฆ],"src":["<doc_chunk_id>",โ€ฆ]}

r is advice grounded in the retrieved docs with {entity_id} placeholders for any live-state references; q lists the entities whose live state the advice depends on (so the consumer substitutes current values); src cites the doc chunks the advice is drawn from. The fix is grounded in the docs, never in entity state alone โ€” the state is the signal, the docs supply the explanation.

Slot order

The specialist slot contract is now 0=command, 1=automation, 2=answer, 3=clarification, 4=utilities. Backends that hot-swap LoRAs (llama-server's /lora-adapters, vLLM --enable-lora) load all five at startup.

Entity-block format reconciled with the integration

format_entities_block in scripts/gen_utils.py emits the exact per-line shape produced by _format_entity_line in custom_components/selora_ai/llm_client/sanitize.py:

AVAILABLE ENTITIES:
  - entity_id=light.kitchen; state=off; friendly_name=Kitchen Lights
  - entity_id=sensor.sun; state=below_horizon; friendly_name=Sun

This keeps train-vs-inference entity-context blocks in lock-step so the model stays in-distribution.

Inference

The manifest carries runtime.cache_prompt = true so the hub starts llama-server with system-prompt KV caching enabled, amortizing the per-specialist system prompt across requests.

Generation parameters

{
  "temperature": 0.0,
  "repeat_penalty": 1.0,
  "repeat_last_n": 256,
  "max_tokens": 384,
  "stop": ["<|im_end|>", "<|endoftext|>"]
}

Bump max_tokens to 1536 for automation requests (longer JSON output).

Training

Base: Qwen3 1.7B fine-tuned with Apple mlx-lm. Each specialist has its own LoRA (rank 8โ€“24, scale 20) trained on a curated HA-domain corpus (forum threads, HA docs, synthetic command / automation pairs). System prompts trained per-specialist; see prompts/. The answer adapter is trained to emit the slim {r,q} shape โ€” a q list of entities to look up alongside its {entity_id}-templated response โ€” which the integration resolves against live state; see prompts/answer_system_prompt.txt and the Modelfile.answer SYSTEM block. The utilities adapter is trained on docs-grounded maintenance / troubleshooting / setup examples with retrieved-doc context and citation provenance.

Training examples

Every specialist trains on the same wrapper โ€” its system prompt, then a user turn carrying USER REQUEST and the AVAILABLE ENTITIES block (plus RELEVANT DOCS for utilities) โ€” with the target slim envelope as the completion. One real pair per specialist (entity / doc lists trimmed; requests and target envelopes verbatim):

# command
USER REQUEST: resume the living room TV
AVAILABLE ENTITIES:
  - entity_id=media_player.living_room_tv; state=paused; friendly_name="living room TV"
โ†’ {"c":[{"s":"media_player.media_play","e":"media_player.living_room_tv"}],"r":"Playing on living room TV."}

# answer
USER REQUEST: are hallway light and coffee maker on
AVAILABLE ENTITIES:
  - entity_id=light.hallway; state=off; friendly_name="hallway light"
  - entity_id=switch.coffee_maker; state=on; friendly_name="coffee maker"
โ†’ {"q":["light.hallway","switch.coffee_maker"],"r":"Hallway light: {light.hallway}, coffee maker: {switch.coffee_maker}."}

# clarification
USER REQUEST: can you do something
AVAILABLE ENTITIES:
  - entity_id=light.entry; state=off; friendly_name="entry light"
  - entity_id=media_player.living_room_tv; state=idle; friendly_name="living room TV"
โ†’ {"q":"Could you tell me what you'd like to do?"}

# automation
USER REQUEST: activate movie night every day at 7pm
AVAILABLE ENTITIES:
  - entity_id=scene.movie_night; friendly_name="movie night"
โ†’ {"intent":"automation","response":"At **7pm** every day, this activates **movie night**. Want me to skip weekends?","description":"Activate scene scene.movie_night at 7pm.","automation":{"alias":"Auto-Movie Night","description":"Activate movie night at 7pm daily.","triggers":[{"trigger":"time","at":"19:00:00"}],"conditions":[],"actions":[{"service":"scene.turn_on","target":{"entity_id":"scene.movie_night"},"data":{}}]}}

# utilities
USER REQUEST: guide me through adding HACS
RELEVANT DOCS:
  - [hacs] Setting up the HACS integration (https://hacs.xyz/docs/use/configuration/basic/) โ€” "Follow these steps to set up the HACS integration and authenticate it with GitHubโ€ฆ"
AVAILABLE ENTITIES:
  - entity_id=light.entry; state=off; friendly_name="entry light"
โ†’ {"r":"To add HACS, go to Settings > Devices & Services > Add Integration and search for it, then follow the config flow. The integration's documentation page covers the prerequisites and step-by-step setup.","src":["hacs:https://hacs.xyz/docs/use/configuration/basic/#2","hacs:https://hacs.xyz/docs/use/configuration/options/#2"]}

Evaluation

Selora AI is evaluated on the open allenporter/home-assistant-datasets suite โ€” the harness behind the Home Assistant LLM leaderboard, so its results are comparable to other models' conversation agents. The headline numbers are in Results at the top of the card; what matters here is what each surface actually grades โ€” each scores a different slice of the stack:

  • The LoRA adapter โ€” each specialist is a small LoRA trained on the slim-envelope schema: one strict, compact output contract per intent (the service calls to run, the entities to read, the question to ask, or the blueprint to build), designed up front and baked into every training example. Learning that single fixed shape per intent is what lets the adapter emit it reliably โ€” and a malformed envelope is a miss on every surface, so this training is what every number ultimately grades.
  • The integration is in the loop for assist, assist-mini, and questions. The adapter emits the envelope; _convert_slim_shape runs the command calls as real Home Assistant service calls and resolves the answer entity list into live-state answers; the metric then grades the resulting device state / produced answer. So those three are end-to-end scores โ€” the adapter plus the integration's parser and routing, exactly how a user experiences it.
  • The raw model is graded directly for automations. The automation adapter's blueprint YAML is the scored artifact, so the integration is bypassed โ€” routing it through would wrap the YAML in a prose summary, the wrong surface for authoring. The 66.7% is the bare adapter; the open misses are light_on_door, where it emits a 2 * 60 duration the Home Assistant loader rejects.

Files in this bundle

Artifact Purpose Distribution
qwen3_17b_base.Q6_K.gguf Q6_K base for Ollama / llama.cpp Hugging Face, ollama.com
selora-{intent}.f16.gguf (ร—5) Specialist LoRA adapters Hugging Face, ollama.com
Modelfile.{intent} Ollama recipes (base + LoRA + system prompt) this repo, ollama.com
prompts/{intent}_system_prompt.txt Plain-text trained prompts (reference / testing) this repo

The full-precision (f16) base and HF safetensors set used by vLLM / TGI / SageMaker live separately in the cloud bundle and are not yet mirrored to Hugging Face.

First-run verification

Five prompts โ€” one per specialist โ€” let you confirm every slot loaded cleanly. Type them into HA's Selora AI panel (or hit the selora_ai/chat_stream WebSocket directly):

Prompt Specialist Expected behaviour
turn on the kitchen light command Light flips on; response: "Kitchen light on."
what lights are on? answer List of currently-on lights with [[entities:...]] markers
create an automation that turns on the porch light at sunset automation Automation card with trigger: sun, event: sunset and the porch light target
turn on a light (with multiple lights present) clarification Asks which one and offers options
why is the living-room sensor unavailable? utilities Docs-grounded troubleshooting steps citing source docs

A clean run on all five = LoRAs loaded, classifier routing correctly, and the v0.4.8 training format reaching the model. If any prompt returns garbage or empty output, check Troubleshooting below.

Troubleshooting

Symptom Likely cause Fix
Selora AI Local not in provider dropdown Probe couldn't reach any host candidate Verify curl http://localhost:8080/v1/models works on the HA host. Add the host manually in config flow if HA can't reach localhost (common on HA OS)
Chat returns empty / repeats one token repeat_penalty != 1.0 somewhere Confirm llama-server is started without an override, or that the Modelfile's PARAMETER repeat_penalty 1.0 line wasn't edited out
Wrong specialist responds (e.g. answer for a command) Hot-swap call hasn't fired Check HA logs for Activating LoRA slot N; if absent, the integration didn't classify the prompt as that intent โ€” file an issue with the prompt text
Model invents entity_ids that don't exist AVAILABLE ENTITIES block not being sent The integration sends this automatically; if you're hitting the model directly, mirror the integration's _format_entity_line output exactly (see "Entity-block format reconciled with the integration" above)
ollama run works but HA can't reach it Ollama default localhost:11434, llama-server 0.0.0.0:8080 โ€” different ports Either point the integration at port 11434 (Ollama path) or run llama-server explicitly. The integration probes :8080 first
Pipeline hangs for 30s on automation prompts Pre-v0.4.7 build of the integration Update the integration to current main

For deeper issues, the integration's debug log (logger: custom_components.selora_ai: debug in configuration.yaml) prints the full classifier decision, the request payload sent to llama-server, and the raw model response โ€” enough to diagnose any reproducible case.

Citation

@misc{selora-ai-2026,
  title  = {Selora AI: Qwen3 1.7B + LoRA Specialists for Home Assistant},
  author = {{Selora Homes}},
  year   = {2026},
  url    = {https://huggingface.co/selorahomes/Selora-AI}
}

License

Apache-2.0

Downloads last month
1,604
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

6-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for selorahomes/Selora-AI

Finetuned
Qwen/Qwen3-1.7B
Adapter
(545)
this model