This guide explains β in plain language β how a SunFounder Pidog is wired to a little "router" program that decides, for every single thing you say, the smartest and cheapest way to answer it. No jargon required.
The Pidog itself is a small Raspberry Pi computer. It's great at listening, talking, moving, and seeing β but it isn't powerful enough to run a serious AI. So it hands the thinking to a second computer (a Mac) running the Router. They talk over your home Wi-Fi.
Not every question deserves the same effort. Asking "who's a good boy?" shouldn't cost money or wait on the internet. The Router keeps a ladder of options and climbs it only as far as a question needs.
A compact 3-billion-parameter model (Qwen2.5-3B) running right on the Mac.
Instant, free, private. Handles greetings, jokes, quick chit-chat.
A 7-billion-parameter model (Qwen2.5-7B), also on the Mac. Slower but
better at reasoning: "why", "how does", "compare", "explain". Still free and private.
Anthropic's Claude models via the API. Reserved for the hard stuff: current events, long or expert answers, and anything needing a web search. Costs a little per use, so it's used sparingly.
Not an AI at all β your existing smart-home hub. "Turn off the kitchen" is sent straight here so it controls the lights and speaks the confirmation back.
| Brain | Lives | Good for | Speed | Cost |
|---|---|---|---|---|
| FAST (3B) | Mac | chit-chat, quick replies | very fast | free |
| SMART (7B) | Mac | reasoning, explanations | ~20 words/sec | free |
| CLOUD (Claude) | internet | current events, expertise, search | fast, has network hop | pennies |
| HOME (HA) | your hub | lights, locks, thermostats | instant | free |
The Router reads your words and checks them against simple rules β key phrases and length β before any AI is even involved. This costs microseconds and keeps easy things cheap.
All the listening and speaking happens on the Pi. Each half has options so you can trade convenience against privacy and quality.
Google (default) β accurate, needs internet.
Vosk β fully offline, private.
Home Assistant β use your HA voice engine (e.g. Whisper).
Local (default) β the Pidog's built-in voice, free & offline.
ElevenLabs β a far more natural cloud voice.
Home Assistant β HA's Piper/cloud voices.
Talking is only half a robot dog. There are two ways it physically acts.
Say a plain command β sit, come, shake hands,
turn left β and the dog does it instantly, without asking any AI at all.
Works even if the internet and the Mac are off. Rock-solid and fast.
For anything conversational, the AI can attach a hidden action to its reply. Ask it to "pretend you're sleepy" and it might yawn a sentence and lie down. It can even chain a short routine β turn, walk, sit β in one go.
[[action:turn_left,forward,sit]]
to its reply. The Router removes that tag before the dog speaks (so it's never said aloud) and
passes the moves to the Pi to perform in order. Unknown moves are ignored safely β a bad guess never
crashes the dog.Ask the dog what it sees and it snaps a photo with its camera, sends the picture along with your question, and describes what's in front of it. Because only Claude can look at images, vision questions always go to the CLOUD brain automatically.
Trigger phrases include "what do you see", "who is this", "what am I holding", and "take a look". Any other time, the camera stays off.
If you run Home Assistant, commands like "turn off the kitchen" are forwarded to it. HA does the understanding β it already knows your rooms and devices β then acts and hands back a spoken reply. If HA doesn't recognise the phrase, it quietly falls through to the AI instead.
The CLOUD brain can search the web on its own. Ask for the weather or today's news and Claude decides by itself to look it up, then answers with live information. You don't program the "when" β it just knows. (Local models can't search, which is another reason those questions route to the cloud.)
Some behaviours don't involve thinking at all β they're reflexes, just like a real dog, and run quietly in the background on the Pi.
Touch its head and it wags its tail β instantly, no AI.
Tilts its head curiously toward an unexpected noise.
Refuses to walk into something close ahead β barks instead of bumping.
Open http://<your-mac>:8000/dashboard in any browser on your network to see a
running feed of every question: which brain answered, how fast, how many words, the estimated
cost, any moves performed, and the full conversation. It updates every couple of seconds β handy
for tuning and genuinely fun to watch.
Every major feature is a simple switch in a settings file (.env). A few examples:
| Setting | What it does |
|---|---|
| CLOUD_MODEL | Which Claude model to use (cheaper Haiku vs. smarter Sonnet). |
| CLOUD_WEB_SEARCH | Let the cloud brain search the web. 0 to turn off. |
| ROUTER_DEFAULT_ROUTE | Send everything to one brain β e.g. cloud if you run no local models. |
| PIDOG_ACTIONS | Allow the AI to move the dog. 0 for a text-only setup. |
| STREAM_TTS | Speak sentence-by-sentence as the answer generates. |
| PIDOG_CAMERA Β· PIDOG_REFLEXES Β· IDLE_MINUTES | Vision, reflexes, and idle behaviours. |