Jarvis · Multimodal personal AI

// 01 — WHY I BUILT IT

THE PROBLEM

Every consumer AI assistant in 2025 ships your data to a cloud, indefinitely. The marketing language is always 'we keep your data secure' but the architecture is unchanged: every conversation, every memory, every preference is mirrored to a vendor server.

I wanted to design from the opposite default: assume the cloud is hostile, then build features that survive that assumption. The interesting design problem is how much you can do with a backend that genuinely doesn't see your data.

// 02 — THE APPROACH

THE WORK

Architecture is a fat client + thin stateless backend. Flutter app on the user's device holds: SQLite (Drift) for structured data, ObjectBox for vector embeddings, TensorFlow Lite for on-device embedding inference, and a memory graph in Apache AGE format. Backend is FastAPI with four routes: auth, LLM proxy, billing, and a relationship-inference sync endpoint that operates on anonymized tokens (PERSON_1, EMAIL_1) only.

The LLM proxy supports four providers (Anthropic, OpenAI, Gemini, Groq) via a factory pattern. Default tier is Groq with Llama 3.3 70B — free, fast, and good enough for most chat. Premium tiers unlock Claude / Gemini / GPT-4o. Token-quota tracking happens server-side; conversation contents do not.

// 03 — KEY DECISIONS

WHAT I CHOSE & WHY

DECISION · 01

Anonymize on-device before any backend call

When the local agent needs to call the backend (for relationship inference, embeddings, etc.), it pre-tokenizes PII: real names become PERSON_1, real emails become EMAIL_1, real addresses become ADDR_1. The backend receives only tokens; the de-anonymization map stays on-device. This makes 'we don't see your data' an architectural property, not a marketing claim.

DECISION · 02

Drift + ObjectBox + TFLite, not a cloud vector DB

The whole point is on-device. Drift for relational, ObjectBox for vectors, TensorFlow Lite for embeddings — three local stores stitched together with Riverpod. Pinecone or Weaviate would have been easier; they would also have defeated the entire premise.

DECISION · 03

Apache AGE for the memory graph

Conversations build up a relationship graph (people, places, events, references). Apache AGE gives a graph query layer on top of Postgres for the (rare) cases the user opts into cloud sync. On-device, the same graph shape lives in Drift with manual edge tables. Same model, two backends.

// 04 — ARCHITECTURE

HOW IT FITS TOGETHER

// FIG. SYSTEM DIAGRAMSCALE 1:N

// 05 — STATE OF THE ART

2026 BLEEDING-EDGE TECH

▸ Apache AGE (graph DB on Postgres)

Postgres extension that adds Cypher-style graph queries. Lets the same memory-graph model live on-device (Drift edge tables) and in the optional cloud sync (AGE) without a schema port.

▸ TensorFlow Lite + ObjectBox vectors

Full on-device embedding + vector search. No cloud dependency for similarity, no Pinecone bill, no privacy compromise.

▸ On-device PII anonymization

PERSON_1, EMAIL_1, ADDR_1 tokenization happens before any backend call. The de-anonymization map never leaves the device. Makes 'we don't see your data' an architectural property, not marketing.

▸ Groq + Llama 3.3 70B as the default tier

Groq's LPU inference is sub-100ms first-token. Free tier on Llama 3.3 70B beats GPT-3.5 quality at zero variable cost — enables a generous free product without venture capital.

▸ Multi-provider AI factory (Anthropic / OpenAI / Gemini / Groq)

Provider-pluggable inference. The factory pattern is a 2026 default for any serious AI product — single-vendor lock-in is too risky.

// 06 — MEASURED

NUMBERS THAT MATTER

Backend routes

4

auth · llm · billing · sync

On-device stores

3

Drift · ObjectBox · TFLite

LLM providers

4

Anthropic · OpenAI · Gemini · Groq

// 07 — IF I DID IT AGAIN

LESSONS · WHAT I'D CHANGE

→On-device privacy is mostly an architecture problem, not a UX problem. Once you commit to 'backend never sees real data,' the feature set falls out: streaming inference is fine because the proxy is stateless; any feature that requires cross-user data (e.g., 'find friends nearby') has to be reframed or dropped.
→Flutter + FastAPI is a productive pairing for fat-client apps. Drift's code generation and Riverpod's testability made the on-device layer feel server-grade.
→The hardest part was building the anonymization layer that's good enough to send to a cloud LLM without leaking implicit PII (e.g., 'my friend who owns the bakery in Park Slope' is identifying even after name redaction).

// 08 — STACK

THE TOOLS

CLIENT

Flutter / Dart

STORE

Drift (SQLite)ObjectBox

ML

TensorFlow Lite

STATE

Riverpod

BACKEND

FastAPIPostgreSQL + pgvector + Apache AGE

LLM

Anthropic · OpenAI · Gemini · Groq

BILLING

Stripe (billing)

// 09 — RELATED WORK

OTHER CASE STUDIES BY TANAY

Architecture sketch · weekend project

Real-Time Sales-Conversation Coaching Agent

Architecture sketch for an agent that listens to in-person field sales conversations and delivers grounded coaching — live during the call and deep after it.

READ →

Real-time data infrastructure

MercuryStream

A real-time market data pipeline that solves the problems exchanges warn you about.

READ →

Reference / open-source library

AI Agent Error-Handling Patterns

Stop your AI agents from failing silently. Four production reliability patterns, with tests and upgrade paths.

READ →

// RELATED NOTES