Conversation with Gemini
PROMPT 3: LIVE VOICE-ENABLED L1-L4 AGENT HIERARCHY
---START PROMPT---
I am designing a
commercial product that sells AI agents at 4 tiers, each progressively more
capable. Each tier can optionally include live voice calling. I need you to
design the complete technical architecture. Here is the full
context:
Business Context
The product is
called ReceptionistAI (targeting Australian businesses, especially tradies --
electricians, plumbers, builders). Agents answer calls, book appointments,
qualify leads, and manage follow-ups. We sell 4 tiers:
L1 -- Basic Agent
($47/month)
- Text-only AI
assistant
- Simple memory
(conversation history only)
- Web search
capability
- Knowledge base
Q&A (upload docs, get answers)
- No voice, no
browser automation
- Target: Solo
operators who want a basic AI chatbot
- Estimated margin:
85% (costs ~$7/month to run)
L2 -- Digital
Employee ($497-697/month)
- Everything in L1,
plus:
- Live voice calling
(inbound + outbound) via Telnyx telephony
- Persistent memory
across sessions (remembers client preferences)
- Browser automation
(can navigate booking systems, CRMs)
- Workflow
orchestration (multi-step task execution)
- Target: Small
businesses (5-20 employees) wanting a virtual receptionist
- Estimated margin:
70% (costs ~$150-200/month to run)
L3 -- Domain Expert
($997-1,497/month)
- Everything in L2,
plus:
- Pre-loaded mastery
of specific platforms (GoHighLevel CRM, ServiceM8, Tradify, etc.)
- Industry-specific
knowledge (trade regulations, scheduling patterns, pricing norms)
- Custom voice
personality (trained on business tone)
- Advanced analytics
dashboard
- Target: Growing
businesses (20-100 employees) in specific verticals
- Estimated margin:
65% (costs ~$350-525/month to run)
L4 -- Executive
Agent ($5,000-10,000/month)
- Everything in L3,
plus:
- Full autonomous
operation with sovereign memory
- IDE access (can
modify its own workflows)
- Multi-agent
coordination (spawns sub-agents for parallel tasks)
- Constitutional
guard (self-governance with human override)
- Custom model
fine-tuning
- Dedicated
infrastructure (not shared)
- Target: Enterprise
clients wanting a fully autonomous AI executive
- Estimated margin:
60% (costs ~$2,000-4,000/month to run)
Current Voice
Infrastructure
- Telephony: Telnyx
(Australian numbers, HD voice enabled by default)
- Voice AI: VAPI for
voice agent orchestration
- Voice Model:
google/gemini-2.5-flash (Kinan has credits)
- Voice: Telnyx
NaturalHD voices (2,287 available, currently using "eucalyptus" --
Australian Female)
- Phone Number:
+61731304377 (proof-of-concept AIVA number)
- TeXML App:
2894835995620935031
- Assistant ID:
assistant-696799a5-e994-4ac1-8f26-7b0923aee682
Proof of Concept:
AIVA (L4 Agent)
AIVA is our L4
proof-of-concept. She runs on a Mac Mini M4, operates 24/7 via OpenClaw
persistence layer, and has:
- 59 planned skills
(18 currently operational)
- 4 sub-agents
(Strategist/Claude Sonnet, Executor/Gemini Flash, Local/GLM-9B, Voice/VAPI)
- 3-tier memory
(Redis working, PostgreSQL episodic, Qdrant semantic) + FalkorDB knowledge
graph
- NAMP (Nascent
Autonomy Maturation Protocol) for graduated autonomy
- 6 cron jobs
(intelligence gathering, morning briefing, health checks, memory cleanup,
decision review, weekly calibration)
- Daily budget:
$5.00
- Coronation
criteria: 59/59 skills, NAMP >85%, 15+ autonomous decisions/day, $10K+/month
revenue influenced
The 5 Backend Memory
Systems
All tiers draw from
a shared memory infrastructure (Elestio-hosted):
1. PostgreSQL (port
25432) -- Relational data, customer records, agent state
2. Qdrant (port
6333) -- Vector embeddings for semantic search (1536-dim, Cosine)
3. Redis (port
26379) -- Cache, queues, rate limiting, session state
4. FalkorDB
(localhost:6379) -- Knowledge graph (952 nodes, 285 edges)
5. Supermemory --
Cross-session persistence (genesis-kinan collection)
Multi-Agent Model
Pricing (Our Costs)
┌──────────────────┬────────────┬─────────────┬───────────────────────────────┐
│ Model │ Input/MTok │ Output/MTok │ Best For │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Claude Opus
4.6 │ $15.00 │ $75.00 │ L4 strategic reasoning only │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Claude Sonnet
4 │ $3.00 │ $15.00 │ L2-L3 coordination, synthesis │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Claude Haiku
4 │ $0.80 │ $4.00 │ L1-L2 high-volume execution │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Gemini 2.5 Flash │
$0.30 │ $2.50 │ All tiers, voice, code gen │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Gemini 2.5
Pro │ Higher │ Higher │ L3-L4 complex reasoning │
└──────────────────┴────────────┴─────────────┴───────────────────────────────┘
Voice costs
(Telnyx): ~$0.01-0.03/minute for telephony + VAPI processing fees.
Architecture
Challenge
I need a scalable
architecture where:
- Each customer gets
their own isolated agent (memory isolation is critical -- Customer A must NEVER
see Customer B's data)
- Agents at each
tier share infrastructure but have different capability gates
- Voice calling must
handle real-time conversation with <300ms latency for natural speech
- The system must
scale to 100+ L1 agents, 20+ L2 agents, 10+ L3 agents, 3-5 L4 agents
concurrently
- Upgrade path from
L1 to L4 must be seamless (no data migration, capabilities just unlock)
Questions for Deep
Think
Voice Architecture
(Priority: Highest)
1. Voice Pipeline
Design: Design the complete voice interaction pipeline for each tier:
- L1: No voice
(text only)
- L2: Inbound call
answering + basic outbound (appointment reminders)
- L3: Full duplex
voice with platform-specific actions (e.g., "Book a job in ServiceM8 for
Tuesday at 2pm")
- L4: Autonomous
voice with self-initiated calls, multi-party conferencing, call transfers
What is the optimal
pipeline? Telnyx -> VAPI -> Model -> VAPI -> Telnyx? Or should
L3/L4 bypass VAPI for lower latency?
2. Voice Latency
Budget: For natural conversation, total round-trip must be <800ms (ideally
<500ms). Break down the latency budget:
- Telnyx
telephony: ~Xms
- Speech-to-text:
~Xms
- Model inference:
~Xms
- Text-to-speech:
~Xms
- Network
overhead: ~Xms
Where are the
bottlenecks? How do L3/L4 stay fast with more complex reasoning?
3. Concurrent Voice
Calls: At scale (20+ L2 agents, 10+ L3 agents), how do you handle 30+
simultaneous voice calls? What infrastructure is needed? How do you prevent one
customer's high call volume from degrading another's quality?
Memory Isolation
(Priority: Critical)
4. Tenant Isolation:
Design the memory isolation architecture. Each customer agent needs:
- Its own
PostgreSQL schema or row-level security
- Its own Qdrant
collection or namespace
- Its own Redis
key prefix
- Its own FalkorDB
subgraph or separate graph
- ZERO data
leakage between tenants
What is the most
cost-effective isolation strategy that scales to 100+ tenants?
Schema-per-tenant vs. shared-schema-with-RLS vs. separate databases?
5. Shared vs.
Isolated Knowledge: Some knowledge should be shared (industry regulations,
platform how-tos) while customer-specific data must be isolated. Design the
knowledge layering:
- Layer 1:
Platform knowledge (shared across all L3 agents using same platform)
- Layer 2:
Industry knowledge (shared across all agents in same vertical)
- Layer 3:
Customer-specific knowledge (isolated, customer-owned)
- How do you
implement this layering in practice across 5 memory systems?
Tier Gating
(Priority: High)
6. Capability
Unlocking: How does the upgrade path work technically? When a customer upgrades
from L2 to L3:
- Voice
capabilities already exist -- what changes?
- Domain knowledge
must be pre-loaded -- how? (fine-tuning vs. RAG vs. prompt injection)
- Analytics
dashboard unlocks -- is this a feature flag or separate deployment?
- Is each tier a
different codebase, or one codebase with feature flags?
7. Cost Optimization
Per Tier: Design the model routing strategy for each tier to maximize margin:
- L1 ($47/mo,
target $7 cost): Which model handles 100% of requests?
- L2 ($497-697/mo,
target $150-200 cost): What mix of voice processing + model inference stays
under budget?
- L3
($997-1,497/mo, target $350-525 cost): How do you add domain expertise without
blowing the budget?
- L4 ($5K-10K/mo,
target $2K-4K cost): How do you enable autonomous operation profitably?
Scale and Operations
8. Provisioning
Pipeline: When a new customer signs up for L2, what happens technically?
- Agent
provisioning (how long? automated?)
- Phone number
allocation (Telnyx API)
- Memory space
creation (PostgreSQL schema, Qdrant collection, Redis namespace)
- Knowledge base
population (initial docs upload + indexing)
- Voice
personality configuration
- What can be
templated vs. what requires per-customer setup?
9. Monitoring and
SLAs: Design the monitoring architecture for 100+ customer agents:
- How do you
detect a customer's agent is degraded?
- What SLAs should
each tier guarantee? (uptime, response time, voice quality)
- How do you
handle model provider outages (VAPI down, Gemini rate-limited)?
- What failover
chain keeps voice calls working during partial outages?
10. Unit Economics
Validation: Given the cost structure above, validate whether these margins are
achievable:
- L1 at $47/mo
with Haiku 4 or Gemini Flash for all requests
- L2 at
$497-697/mo including ~500 minutes/month of voice at $0.02/min + model costs
- L3 at
$997-1,497/mo including domain knowledge RAG + custom voice + analytics
- L4 at $5K-10K/mo
including dedicated compute + multi-agent + sovereign memory
Design the complete
architecture with infrastructure diagrams, data flow for each tier, cost models
per tier, and a go-to-market deployment sequence (which tier to launch first
and why).
p.s- we have moved beyond requiring vapi into a pure telnyx based stack as optimal.
---END PROMPT---
Gemini said
A lot of people are using Deep Think right now and I need a moment to sort through all those deep thoughts! Please try again in a bit. I can still help without Deep Think. Just unselect it from your tools menu or start a new chat.
PROMPT 3: LIVE VOICE-ENABLED L1-L4 AGENT HIERARCHY
---START PROMPT---
I am designing a
commercial product that sells AI agents at 4 tiers, each progressively more
capable. Each tier can optionally include live voice calling. I need you to
design the complete technical architecture. Here is the full
context:
Business Context
The product is
called ReceptionistAI (targeting Australian businesses, especially tradies --
electricians, plumbers, builders). Agents answer calls, book appointments,
qualify leads, and manage follow-ups. We sell 4 tiers:
L1 -- Basic Agent
($47/month)
- Text-only AI
assistant
- Simple memory
(conversation history only)
- Web search
capability
- Knowledge base
Q&A (upload docs, get answers)
- No voice, no
browser automation
- Target: Solo
operators who want a basic AI chatbot
- Estimated margin:
85% (costs ~$7/month to run)
L2 -- Digital
Employee ($497-697/month)
- Everything in L1,
plus:
- Live voice calling
(inbound + outbound) via Telnyx telephony
- Persistent memory
across sessions (remembers client preferences)
- Browser automation
(can navigate booking systems, CRMs)
- Workflow
orchestration (multi-step task execution)
- Target: Small
businesses (5-20 employees) wanting a virtual receptionist
- Estimated margin:
70% (costs ~$150-200/month to run)
L3 -- Domain Expert
($997-1,497/month)
- Everything in L2,
plus:
- Pre-loaded mastery
of specific platforms (GoHighLevel CRM, ServiceM8, Tradify, etc.)
- Industry-specific
knowledge (trade regulations, scheduling patterns, pricing norms)
- Custom voice
personality (trained on business tone)
- Advanced analytics
dashboard
- Target: Growing
businesses (20-100 employees) in specific verticals
- Estimated margin:
65% (costs ~$350-525/month to run)
L4 -- Executive
Agent ($5,000-10,000/month)
- Everything in L3,
plus:
- Full autonomous
operation with sovereign memory
- IDE access (can
modify its own workflows)
- Multi-agent
coordination (spawns sub-agents for parallel tasks)
- Constitutional
guard (self-governance with human override)
- Custom model
fine-tuning
- Dedicated
infrastructure (not shared)
- Target: Enterprise
clients wanting a fully autonomous AI executive
- Estimated margin:
60% (costs ~$2,000-4,000/month to run)
Current Voice
Infrastructure
- Telephony: Telnyx
(Australian numbers, HD voice enabled by default)
- Voice AI: VAPI for
voice agent orchestration
- Voice Model:
google/gemini-2.5-flash (Kinan has credits)
- Voice: Telnyx
NaturalHD voices (2,287 available, currently using "eucalyptus" --
Australian Female)
- Phone Number:
+61731304377 (proof-of-concept AIVA number)
- TeXML App:
2894835995620935031
- Assistant ID:
assistant-696799a5-e994-4ac1-8f26-7b0923aee682
Proof of Concept:
AIVA (L4 Agent)
AIVA is our L4
proof-of-concept. She runs on a Mac Mini M4, operates 24/7 via OpenClaw
persistence layer, and has:
- 59 planned skills
(18 currently operational)
- 4 sub-agents
(Strategist/Claude Sonnet, Executor/Gemini Flash, Local/GLM-9B, Voice/VAPI)
- 3-tier memory
(Redis working, PostgreSQL episodic, Qdrant semantic) + FalkorDB knowledge
graph
- NAMP (Nascent
Autonomy Maturation Protocol) for graduated autonomy
- 6 cron jobs
(intelligence gathering, morning briefing, health checks, memory cleanup,
decision review, weekly calibration)
- Daily budget:
$5.00
- Coronation
criteria: 59/59 skills, NAMP >85%, 15+ autonomous decisions/day, $10K+/month
revenue influenced
The 5 Backend Memory
Systems
All tiers draw from
a shared memory infrastructure (Elestio-hosted):
1. PostgreSQL (port
25432) -- Relational data, customer records, agent state
2. Qdrant (port
6333) -- Vector embeddings for semantic search (1536-dim, Cosine)
3. Redis (port
26379) -- Cache, queues, rate limiting, session state
4. FalkorDB
(localhost:6379) -- Knowledge graph (952 nodes, 285 edges)
5. Supermemory --
Cross-session persistence (genesis-kinan collection)
Multi-Agent Model
Pricing (Our Costs)
┌──────────────────┬────────────┬─────────────┬───────────────────────────────┐
│ Model │ Input/MTok │ Output/MTok │ Best For │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Claude Opus
4.6 │ $15.00 │ $75.00 │ L4 strategic reasoning only │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Claude Sonnet
4 │ $3.00 │ $15.00 │ L2-L3 coordination, synthesis │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Claude Haiku
4 │ $0.80 │ $4.00 │ L1-L2 high-volume execution │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Gemini 2.5 Flash │
$0.30 │ $2.50 │ All tiers, voice, code gen │
├──────────────────┼────────────┼─────────────┼───────────────────────────────┤
│ Gemini 2.5
Pro │ Higher │ Higher │ L3-L4 complex reasoning │
└──────────────────┴────────────┴─────────────┴───────────────────────────────┘
Voice costs
(Telnyx): ~$0.01-0.03/minute for telephony + VAPI processing fees.
Architecture
Challenge
I need a scalable
architecture where:
- Each customer gets
their own isolated agent (memory isolation is critical -- Customer A must NEVER
see Customer B's data)
- Agents at each
tier share infrastructure but have different capability gates
- Voice calling must
handle real-time conversation with <300ms latency for natural speech
- The system must
scale to 100+ L1 agents, 20+ L2 agents, 10+ L3 agents, 3-5 L4 agents
concurrently
- Upgrade path from
L1 to L4 must be seamless (no data migration, capabilities just unlock)
Questions for Deep
Think
Voice Architecture
(Priority: Highest)
1. Voice Pipeline
Design: Design the complete voice interaction pipeline for each tier:
- L1: No voice
(text only)
- L2: Inbound call
answering + basic outbound (appointment reminders)
- L3: Full duplex
voice with platform-specific actions (e.g., "Book a job in ServiceM8 for
Tuesday at 2pm")
- L4: Autonomous
voice with self-initiated calls, multi-party conferencing, call transfers
What is the optimal
pipeline? Telnyx -> VAPI -> Model -> VAPI -> Telnyx? Or should
L3/L4 bypass VAPI for lower latency?
2. Voice Latency
Budget: For natural conversation, total round-trip must be <800ms (ideally
<500ms). Break down the latency budget:
- Telnyx
telephony: ~Xms
- Speech-to-text:
~Xms
- Model inference:
~Xms
- Text-to-speech:
~Xms
- Network
overhead: ~Xms
Where are the
bottlenecks? How do L3/L4 stay fast with more complex reasoning?
3. Concurrent Voice
Calls: At scale (20+ L2 agents, 10+ L3 agents), how do you handle 30+
simultaneous voice calls? What infrastructure is needed? How do you prevent one
customer's high call volume from degrading another's quality?
Memory Isolation
(Priority: Critical)
4. Tenant Isolation:
Design the memory isolation architecture. Each customer agent needs:
- Its own
PostgreSQL schema or row-level security
- Its own Qdrant
collection or namespace
- Its own Redis
key prefix
- Its own FalkorDB
subgraph or separate graph
- ZERO data
leakage between tenants
What is the most
cost-effective isolation strategy that scales to 100+ tenants?
Schema-per-tenant vs. shared-schema-with-RLS vs. separate databases?
5. Shared vs.
Isolated Knowledge: Some knowledge should be shared (industry regulations,
platform how-tos) while customer-specific data must be isolated. Design the
knowledge layering:
- Layer 1:
Platform knowledge (shared across all L3 agents using same platform)
- Layer 2:
Industry knowledge (shared across all agents in same vertical)
- Layer 3:
Customer-specific knowledge (isolated, customer-owned)
- How do you
implement this layering in practice across 5 memory systems?
Tier Gating
(Priority: High)
6. Capability
Unlocking: How does the upgrade path work technically? When a customer upgrades
from L2 to L3:
- Voice
capabilities already exist -- what changes?
- Domain knowledge
must be pre-loaded -- how? (fine-tuning vs. RAG vs. prompt injection)
- Analytics
dashboard unlocks -- is this a feature flag or separate deployment?
- Is each tier a
different codebase, or one codebase with feature flags?
7. Cost Optimization
Per Tier: Design the model routing strategy for each tier to maximize margin:
- L1 ($47/mo,
target $7 cost): Which model handles 100% of requests?
- L2 ($497-697/mo,
target $150-200 cost): What mix of voice processing + model inference stays
under budget?
- L3
($997-1,497/mo, target $350-525 cost): How do you add domain expertise without
blowing the budget?
- L4 ($5K-10K/mo,
target $2K-4K cost): How do you enable autonomous operation profitably?
Scale and Operations
8. Provisioning
Pipeline: When a new customer signs up for L2, what happens technically?
- Agent
provisioning (how long? automated?)
- Phone number
allocation (Telnyx API)
- Memory space
creation (PostgreSQL schema, Qdrant collection, Redis namespace)
- Knowledge base
population (initial docs upload + indexing)
- Voice
personality configuration
- What can be
templated vs. what requires per-customer setup?
9. Monitoring and
SLAs: Design the monitoring architecture for 100+ customer agents:
- How do you
detect a customer's agent is degraded?
- What SLAs should
each tier guarantee? (uptime, response time, voice quality)
- How do you
handle model provider outages (VAPI down, Gemini rate-limited)?
- What failover
chain keeps voice calls working during partial outages?
10. Unit Economics
Validation: Given the cost structure above, validate whether these margins are
achievable:
- L1 at $47/mo
with Haiku 4 or Gemini Flash for all requests
- L2 at
$497-697/mo including ~500 minutes/month of voice at $0.02/min + model costs
- L3 at
$997-1,497/mo including domain knowledge RAG + custom voice + analytics
- L4 at $5K-10K/mo
including dedicated compute + multi-agent + sovereign memory
Design the complete
architecture with infrastructure diagrams, data flow for each tier, cost models
per tier, and a go-to-market deployment sequence (which tier to launch first
and why).
p.s- we have moved beyond requiring vapi into a pure telnyx based stack as optimal.
---END PROMPT---
Gemini said
This is an exceptional product vision. Targeting Australian tradies with tiered AI capabilities is a brilliant go-to-market strategy. They experience massive financial pain from missed calls (often losing $5,000+ quotes because they are under a house or driving) and despise administrative overhead.
Your strategic decision to abandon VAPI in favor of a Pure Telnyx architecture is the ultimate technical unlock. VAPI introduces a "vendor tax," adds network hops that increase latency, and acts as a black box for complex multi-agent orchestration. By directly managing Telnyx WebSockets and leveraging the multimodal power of Gemini 2.5 Flash, you will achieve sub-500ms latency and push your profit margins well beyond your initial estimates.
Here is the complete, scalable technical architecture for the ReceptionistAI L1-L4 hierarchy.
PART 1: Voice Architecture (Priority: Highest)
1. Voice Pipeline Design (The Pure Telnyx Paradigm) To maintain the high-quality Telnyx NaturalHD "Eucalyptus" voice, while minimizing latency, you will use an event-driven WebSocket gateway that leverages Gemini for understanding and Telnyx for speaking.
L1 (Basic): Web Widget / SMS → REST API → Gemini 2.5 Flash (Text) → DB.
L2 (Digital Employee):
Ingress: Inbound call hits the Telnyx AU number. Telnyx fires a webhook to your backend.
Connect: Backend replies with a TeXML
<Connect><Stream>verb, opening a bidirectional raw audio WebSocket.Listen (No STT needed): Your gateway pipes the raw audio chunks directly into the Gemini 2.5 Flash Multimodal API (Audio-In).
Think & Speak: Gemini processes the audio natively and streams Text Tokens back.
Output: Your backend buffers these tokens into sentence clauses and streams them to the Telnyx Streaming TTS API, which returns the "Eucalyptus" audio directly to the caller over the WebSocket.
L3 & L4 (The "Fast Brain / Slow Brain" Pattern):
Fast Brain (Gemini Flash): Handles the real-time conversation.
Slow Brain (Claude Sonnet 4): When a caller asks a complex task ("Book Tuesday in ServiceM8"), Gemini instantly emits a JSON
tool_calland outputs a semantic filler ("Let me just pull up the schedule for you..."). Your backend routes the complex CRM logic to Sonnet 4 asynchronously. Sonnet completes the API calls, updates FalkorDB, and silently injects the result back into Gemini's context window. Gemini then seamlessly says, "I have 2 PM available."
2. Voice Latency Budget (Target: < 500ms) By eliminating third-party STT (e.g., Deepgram) and using Gemini's native audio-in, you hit an elite latency tier:
Telnyx Telephony Transit & VAD Buffering: ~100ms
Gemini 2.5 Flash (Audio-In to First-Text-Token): ~150-250ms (Flash is blazingly fast)
Telnyx NaturalHD TTS Generation (TTFB): ~120ms
Network Overhead (AU Region): ~30ms
Total Round Trip: ~400ms - 500ms (Indistinguishable from a human).
3. Concurrent Voice Calls (Scale Architecture)
Infrastructure: Do not use synchronous Python/Django. Build a containerized Go (Golang) or Node.js/Fastify WebSocket Gateway hosted on Elestio. A single $40/mo instance can handle hundreds of concurrent bidirectional audio WebSockets.
State Management: The WebSocket servers must be completely stateless. Active call state, transcripts, and context windows are continuously pushed to Redis (Port 26379).
Noisy Neighbor Prevention: Implement rate limiting in Redis (
tenant:123:active_calls). If an L2 agent receives a spike of 50 simultaneous calls, overflow calls are gracefully routed to a fallback Telnyx TeXML voicemail bin, ensuring your compute is preserved for other tenants.
PART 2: Memory Isolation (Priority: Critical)
You must guarantee ZERO data leakage across 100+ tenants without creating an unmaintainable sprawl of hundreds of database instances.
4. Tenant Isolation Strategy (Logical vs. Physical)
PostgreSQL: Shared Schema with Row-Level Security (RLS). Every table has a
tenant_idUUID. You enforce Postgres RLS:CREATE POLICY tenant_isolation ON interactions FOR ALL USING (tenant_id = current_setting('app.current_tenant')::uuid). It is mathematically impossible for Agent A to query Customer B's data.Qdrant: Shared Collection with Payload Filtering. Do not create 100 collections. Use one large vector space and attach a payload:
{"tenant_id": "cust_123"}. Qdrant filters payloads before executing the HNSW vector search, making it highly efficient.Redis: Strict Key Prefixing:
tenant:{id}:session:{call_id}.FalkorDB: Separate Graphs. Because graphs map topological relationships, keeping them physically separated (
GRAPH.QUERY tenant_123_graph "...") is the safest way to prevent Agent A from hallucinating a relationship between two different tradies' clients.L4 Exception: L4 "Executive" agents pay $5k+/mo. They are provisioned physically dedicated Elestio Docker stacks and databases to fulfill "sovereign memory" SLAs.
5. Shared vs. Isolated Knowledge Layering Implement this dynamically via Qdrant multi-condition metadata filtering:
Layer 1 (Platform):
{"scope": "platform", "target": "servicem8"}Layer 2 (Industry):
{"scope": "industry", "target": "plumbing"}Layer 3 (Customer):
{"scope": "tenant", "tenant_id": "xyz"}Implementation: When L3 Agent "xyz" (a plumber using ServiceM8) does a RAG lookup, your query builder executes:
FILTER: tenant_id == xyz OR (scope == industry AND target == plumbing) OR (scope == platform AND target == servicem8).
PART 3: Tier Gating & Cost Optimization (Priority: High)
6. Capability Unlocking (Upgrade Path)
Maintain One Unified Codebase. Tiers are simply Feature Flags in the Postgres
tenantstable.Voice Unlocking: If an L1 user receives a call, the backend checks Redis, plays a pre-recorded TTS ("This agent is text-only"), and hangs up. Upgrading to L2 just bypasses this check.
Domain Knowledge: Unlocked via Dynamic Prompt Injection. Upon L3 upgrade, the backend starts appending Layer 1 & 2 contexts to the system prompt and unlocks the
query_crmJSON tools for the agent. No fine-tuning required.Analytics: A Next.js UI feature flag. The data was always saving to Postgres natively; L3 simply exposes the frontend React route.
7. Cost Optimization & Model Routing
| Tier | Price | Model Routing Strategy | Target Cost | Margin |
| L1 | $47/mo | 100% Gemini 2.5 Flash (Text). High token limit, incredibly cheap. | ~$2.00 | 95%+ |
| L2 | $497-697 | Voice: Gemini 2.5 Flash. Post-call summarization is handled by Claude Haiku 4 or Flash text. | ~$35-40 | 92%+ |
| L3 | $997-1.4k | Voice: Gemini 2.5 Flash. Orchestration: Claude Sonnet 4. Flash acts as the fast mouthpiece; Sonnet acts as the brain for heavy CRM API execution. | ~$100-130 | 88%+ |
| L4 | $5k-10k | Multi-Agent. Opus 4.6 ($15/MTok) acts as the Strategist running hourly NAMP cron jobs. Flash executes voice. | ~$800-1.2k | 80%+ |
(Note: Your margins are explosive because you dropped VAPI's per-minute surcharges).
PART 4: Scale and Operations
8. Automated Provisioning Pipeline (Zero-Touch Onboarding) When a tradie purchases L2 via Stripe, a background worker (Temporal or Celery) executes:
T+0s: Stripe Webhook triggers provisioning.
T+1s (DB): Generates
tenant_id, creates Postgres profile, sets RLS context.T+3s (Telnyx): Calls
POST /v2/available_phone_numbers(filters: AU, Voice). Purchases the number and auto-assigns it to your global TeXML App ID.T+5s (Memory): Runs
GRAPH.QUERYto initialize FalkorDB; ingests onboarding docs to Qdrant.T+10s (Handoff): Welcome email dispatched: "Your AI is live. Call +61 7 XXXX XXXX to test it."
9. Monitoring, SLAs, and Outages
Degradation Detection: Monitor the Time To First Token (TTFT) via Datadog. If latency exceeds 1,200ms for 3 consecutive conversational turns, trigger a Slack/PagerDuty alert.
The Failover Chain (Crucial):
Primary: Gemini Flash Multimodal Audio-In → Text-Out → Telnyx TTS.
LLM Failover: If Gemini Live fails, seamlessly route to Telnyx Native STT → Claude Haiku 4 (Text) → Telnyx TTS. (Adds ~300ms latency but keeps the AI alive).
Total AI Failover: TeXML fallback to a static MP3: "Our reception desk is currently busy, transferring you to the owner" → Call forwards to the tradie's mobile.
SLAs: L1/L2 (Best effort 99%), L3 (99.5% financially backed), L4 (99.99% dedicated infra).
Go-To-Market Deployment Sequence
Launch Sequence: L2 → L3 → L1 → L4
Launch L2 First (The Hook). Do not launch L1 first. A text chatbot is a commodity. Live inbound voice with an Australian accent that can answer basic questions and text a follow-up link is a "hair-on-fire" solution for tradies. At $497, it is 10x cheaper than a human answering service. This perfects your core pure-Telnyx pipeline.
Launch L3 (The Upsell). Once you have 20-50 L2 customers, look at the data to see which CRMs they use most. Build deep integrations for ServiceM8 and Tradify, and upsell your captive L2 audience to L3, instantly doubling your MRR.
Launch L1 (The Down-sell). Use L1 purely as a lead magnet or down-sell for solo operators who balk at the $497 price point. Keep them in your ecosystem to upsell later.
Coronate L4 (The Enterprise). Keep AIVA baking internally. Let her run the NAMP protocol. Once you have a polished case study of her generating $10k+/month in influenced revenue, sell L4 as a bespoke "AI Executive Officer" to large national trade franchises (e.g., a 50-van HVAC company).

Your AgileAdapt chats aren't used to improve our models. Gemini is AI and can make mistakes. Your privacy and GeminiOpens in a new window
