Skip to content
Unpaid tier and Google One users: Gemini CLI will be replaced by Antigravity CLI on June 18th. To learn more, see our blog post.

`gemini gemma` — Automated Local Model Routing Setup

Local model routing uses a local Gemma 3 1B model running on your machine to classify and route user requests. It routes simple requests (like file reads) to Gemini Flash and complex requests (like architecture discussions) to Gemini Pro.

This feature saves cloud API costs by using local inference for task classification instead of a cloud-based classifier. It adds a few milliseconds of local latency but can significantly reduce the overall token usage for hosted models.

Terminal window
# One command does everything: downloads runtime, pulls model, configures settings, starts server
gemini gemma setup

You’ll be prompted to accept the Gemma Terms of Use. The model is ~1 GB.

After setup, just use the CLI normally — routing happens automatically on every request.

CommandWhat it does
gemini gemma setupFull install (binary + model + settings + server start)
gemini gemma statusHealth check — shows what’s installed and running
gemini gemma startStart the LiteRT server (auto-starts on CLI launch by default)
gemini gemma stopStop the LiteRT server
gemini gemma logsTail the server logs to see routing requests live
/gemmaIn-session status check (type it inside the CLI)
  1. Run gemini gemma status — all checks should show green
  2. Open two terminals:
    • Terminal 1: gemini gemma logs (watch for incoming requests)
    • Terminal 2: use the CLI normally
  3. You should see classification requests appear in the logs as you interact with the CLI
  4. The /gemma slash command inside a session shows a quick status panel
Terminal window
gemini gemma setup --port 8080 # custom port
gemini gemma setup --no-start # don't start server after install
gemini gemma setup --force # re-download everything
gemini gemma setup --skip-model # binary only, skip the 1GB model download
  • Local Gemma classifies each request as “simple” or “complex” (~100ms)
  • Simple → Flash, Complex → Pro
  • If the local server is down, the CLI silently falls back to the cloud classifier — no errors, no disruption

Set enabled: false in settings or just run gemini gemma stop to turn off the server:

{ "experimental": { "gemmaModelRouter": { "enabled": false } } }

If you are in an environment where the gemini gemma setup command cannot automatically download binaries (for example, behind a strict corporate firewall), you can perform the setup manually.

For more information, see the Manual Local Model Routing Setup guide.