Jarvis.
A privacy-first, on-device AI assistant — your data never leaves your phone.
Every consumer AI assistant in 2025 ships your data to a cloud, indefinitely. The marketing language is always 'we keep your data secure' but the architecture is unchanged: every conversation, every memory, every preference is mirrored to a vendor server.
I wanted to design from the opposite default: assume the cloud is hostile, then build features that survive that assumption. The interesting design problem is how much you can do with a backend that genuinely doesn't see your data.
Architecture is a fat client + thin stateless backend. Flutter app on the user's device holds: SQLite (Drift) for structured data, ObjectBox for vector embeddings, TensorFlow Lite for on-device embedding inference, and a memory graph in Apache AGE format. Backend is FastAPI with four routes: auth, LLM proxy, billing, and a relationship-inference sync endpoint that operates on anonymized tokens (PERSON_1, EMAIL_1) only.
The LLM proxy supports four providers (Anthropic, OpenAI, Gemini, Groq) via a factory pattern. Default tier is Groq with Llama 3.3 70B — free, fast, and good enough for most chat. Premium tiers unlock Claude / Gemini / GPT-4o. Token-quota tracking happens server-side; conversation contents do not.
Anonymize on-device before any backend call
When the local agent needs to call the backend (for relationship inference, embeddings, etc.), it pre-tokenizes PII: real names become PERSON_1, real emails become EMAIL_1, real addresses become ADDR_1. The backend receives only tokens; the de-anonymization map stays on-device. This makes 'we don't see your data' an architectural property, not a marketing claim.
Drift + ObjectBox + TFLite, not a cloud vector DB
The whole point is on-device. Drift for relational, ObjectBox for vectors, TensorFlow Lite for embeddings — three local stores stitched together with Riverpod. Pinecone or Weaviate would have been easier; they would also have defeated the entire premise.
Apache AGE for the memory graph
Conversations build up a relationship graph (people, places, events, references). Apache AGE gives a graph query layer on top of Postgres for the (rare) cases the user opts into cloud sync. On-device, the same graph shape lives in Drift with manual edge tables. Same model, two backends.
Postgres extension that adds Cypher-style graph queries. Lets the same memory-graph model live on-device (Drift edge tables) and in the optional cloud sync (AGE) without a schema port.
Full on-device embedding + vector search. No cloud dependency for similarity, no Pinecone bill, no privacy compromise.
PERSON_1, EMAIL_1, ADDR_1 tokenization happens before any backend call. The de-anonymization map never leaves the device. Makes 'we don't see your data' an architectural property, not marketing.
Groq's LPU inference is sub-100ms first-token. Free tier on Llama 3.3 70B beats GPT-3.5 quality at zero variable cost — enables a generous free product without venture capital.
Provider-pluggable inference. The factory pattern is a 2026 default for any serious AI product — single-vendor lock-in is too risky.
- →On-device privacy is mostly an architecture problem, not a UX problem. Once you commit to 'backend never sees real data,' the feature set falls out: streaming inference is fine because the proxy is stateless; any feature that requires cross-user data (e.g., 'find friends nearby') has to be reframed or dropped.
- →Flutter + FastAPI is a productive pairing for fat-client apps. Drift's code generation and Riverpod's testability made the on-device layer feel server-grade.
- →The hardest part was building the anonymization layer that's good enough to send to a cloud LLM without leaking implicit PII (e.g., 'my friend who owns the bakery in Park Slope' is identifying even after name redaction).