TANAY.SHAH
▴ PROTOTYPE · 2025 · MULTIMODAL PERSONAL AI · UPDATED 2026-05-10

Jarvis.

A privacy-first, on-device AI assistant — your data never leaves your phone.

// 01 — WHY I BUILT IT
THE PROBLEM

Every consumer AI assistant in 2025 ships your data to a cloud, indefinitely. The marketing language is always 'we keep your data secure' but the architecture is unchanged: every conversation, every memory, every preference is mirrored to a vendor server.

I wanted to design from the opposite default: assume the cloud is hostile, then build features that survive that assumption. The interesting design problem is how much you can do with a backend that genuinely doesn't see your data.

// 02 — THE APPROACH
THE WORK

Architecture is a fat client + thin stateless backend. Flutter app on the user's device holds: SQLite (Drift) for structured data, ObjectBox for vector embeddings, TensorFlow Lite for on-device embedding inference, and a memory graph in Apache AGE format. Backend is FastAPI with four routes: auth, LLM proxy, billing, and a relationship-inference sync endpoint that operates on anonymized tokens (PERSON_1, EMAIL_1) only.

The LLM proxy supports four providers (Anthropic, OpenAI, Gemini, Groq) via a factory pattern. Default tier is Groq with Llama 3.3 70B — free, fast, and good enough for most chat. Premium tiers unlock Claude / Gemini / GPT-4o. Token-quota tracking happens server-side; conversation contents do not.

// 03 — KEY DECISIONS
WHAT I CHOSE & WHY
DECISION · 01

Anonymize on-device before any backend call

When the local agent needs to call the backend (for relationship inference, embeddings, etc.), it pre-tokenizes PII: real names become PERSON_1, real emails become EMAIL_1, real addresses become ADDR_1. The backend receives only tokens; the de-anonymization map stays on-device. This makes 'we don't see your data' an architectural property, not a marketing claim.

DECISION · 02

Drift + ObjectBox + TFLite, not a cloud vector DB

The whole point is on-device. Drift for relational, ObjectBox for vectors, TensorFlow Lite for embeddings — three local stores stitched together with Riverpod. Pinecone or Weaviate would have been easier; they would also have defeated the entire premise.

DECISION · 03

Apache AGE for the memory graph

Conversations build up a relationship graph (people, places, events, references). Apache AGE gives a graph query layer on top of Postgres for the (rare) cases the user opts into cloud sync. On-device, the same graph shape lives in Drift with manual edge tables. Same model, two backends.

// 04 — ARCHITECTURE
HOW IT FITS TOGETHER
// FIG. SYSTEM DIAGRAMSCALE 1:N
Jarvis architecture — fat client with PII anonymization layerArchitecture diagram for Jarvis, a personal AI assistant. A fat client holds session and history locally and routes requests through a PII anonymization layer that scrubs identifying entities before they reach the LLM. A thin stateless backend brokers provider calls (Claude, GPT, Gemini), with anonymized tokens re-hydrated client-side after the response returns. The design centers on minimizing server-side data retention while preserving a smooth chat UX.ON-DEVICE · LOCALTRUST BOUNDARY · ANONTHIN STATELESS BACKENDLLM PROVIDERS · PROXIEDFLUTTER APPchat · agent · memory · settings · onboardingDRIFT (SQLite)relationalconversationsmemory graph─ AGE-shapedOBJECTBOXon-device vectorsembedding storeno Pinecone billTENSORFLOW LITEembedding inferenceon-device modelno cloud round-tripAGENT LOOPRiverpod stateOAuth bridgesPII ANONYMIZERnames → PERSON_1 · emails → EMAIL_1 · addresses → ADDR_1FASTAPI · STATELESS4 routes: /auth · /llm (proxy + stream) · /billing (Stripe) · /sync (anonymized graph)─ never sees real user data · stores only quotas + Stripe webhook eventsANTHROPICmulti-provider factoryOPENAImulti-provider factoryGEMINImulti-provider factoryGROQ (Llama)multi-provider factory
// 05 — STATE OF THE ART
2026 BLEEDING-EDGE TECH
Apache AGE (graph DB on Postgres)

Postgres extension that adds Cypher-style graph queries. Lets the same memory-graph model live on-device (Drift edge tables) and in the optional cloud sync (AGE) without a schema port.

TensorFlow Lite + ObjectBox vectors

Full on-device embedding + vector search. No cloud dependency for similarity, no Pinecone bill, no privacy compromise.

On-device PII anonymization

PERSON_1, EMAIL_1, ADDR_1 tokenization happens before any backend call. The de-anonymization map never leaves the device. Makes 'we don't see your data' an architectural property, not marketing.

Groq + Llama 3.3 70B as the default tier

Groq's LPU inference is sub-100ms first-token. Free tier on Llama 3.3 70B beats GPT-3.5 quality at zero variable cost — enables a generous free product without venture capital.

Multi-provider AI factory (Anthropic / OpenAI / Gemini / Groq)

Provider-pluggable inference. The factory pattern is a 2026 default for any serious AI product — single-vendor lock-in is too risky.

// 06 — MEASURED
NUMBERS THAT MATTER
Backend routes
4
auth · llm · billing · sync
On-device stores
3
Drift · ObjectBox · TFLite
LLM providers
4
Anthropic · OpenAI · Gemini · Groq
// 07 — IF I DID IT AGAIN
LESSONS · WHAT I'D CHANGE
  • On-device privacy is mostly an architecture problem, not a UX problem. Once you commit to 'backend never sees real data,' the feature set falls out: streaming inference is fine because the proxy is stateless; any feature that requires cross-user data (e.g., 'find friends nearby') has to be reframed or dropped.
  • Flutter + FastAPI is a productive pairing for fat-client apps. Drift's code generation and Riverpod's testability made the on-device layer feel server-grade.
  • The hardest part was building the anonymization layer that's good enough to send to a cloud LLM without leaking implicit PII (e.g., 'my friend who owns the bakery in Park Slope' is identifying even after name redaction).
// 08 — STACK
THE TOOLS
CLIENT
Flutter / Dart
STORE
Drift (SQLite)ObjectBox
ML
TensorFlow Lite
STATE
Riverpod
BACKEND
FastAPIPostgreSQL + pgvector + Apache AGE
LLM
Anthropic · OpenAI · Gemini · Groq
BILLING
Stripe (billing)