Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Code & Software
Safety & Security
Evaluation
Research
Development
Meta
Core Concepts
Reasoning Techniques
Memory Systems
Retrieval
Agent Types
Design Patterns
Training & Alignment
Frameworks
Tools & Products
Code & Software
Safety & Security
Evaluation
Research
Development
Meta
On-Device Agents are AI agent systems that run entirely on local hardware – smartphones, tablets, PCs, and edge devices – without requiring cloud connectivity for inference. By shifting computation to the device, on-device agents deliver ultra-low latency, privacy-by-design data handling, and offline reliability. Recent advances in small language models and specialized function-calling architectures have made agentic capabilities practical on mobile hardware.
Cloud-based AI agents require network round-trips for every inference call, adding latency, incurring per-request API costs, and sending potentially sensitive user data to external servers. On-device agents eliminate these constraints: the model runs on the local processor, functions in airplane mode, incurs no per-request charges, and processes sensitive data without it ever leaving the device.
The key challenge is fitting capable models – especially those with function-calling abilities – into the memory and compute constraints of mobile hardware while maintaining accuracy and battery efficiency.
Google AI Edge is Google's platform for deploying AI models on mobile and edge devices. The ecosystem includes:
A showcase application (available on Android and iOS) demonstrating on-device AI capabilities powered by Gemma and other open-weight models. In February 2026, Google announced major updates including on-device function calling support and cross-platform iOS availability.
The smallest model in Google's Gemini family, designed for on-device inference:
Released December 2025, FunctionGemma is a specialized version of Gemma 3 (270M parameters) fine-tuned specifically for function calling. Key characteristics:
A library enabling developers to use function calling with on-device LLMs. The pipeline involves:
The real power of on-device agents emerges when models can invoke functions – opening apps, adjusting settings, creating calendar entries, or navigating to destinations. This transforms passive text generation into active device interaction.
from dataclasses import dataclass @dataclass class FunctionDeclaration: name: str description: str parameters: dict # Define available device functions device_functions = [ FunctionDeclaration( name="create_calendar_event", description="Create a new calendar event", parameters={ "title": {"type": "string", "required": True}, "datetime": {"type": "string", "format": "iso8601"}, "duration_minutes": {"type": "integer", "default": 60} } ), FunctionDeclaration( name="set_alarm", description="Set a device alarm", parameters={ "time": {"type": "string", "format": "HH:MM"}, "label": {"type": "string", "default": "Alarm"} } ), FunctionDeclaration( name="navigate_to", description="Open navigation to a destination", parameters={ "destination": {"type": "string", "required": True}, "mode": {"type": "string", "enum": ["driving", "walking"]} } ), ] class OnDeviceAgent: def __init__(self, model, functions): self.model = model self.functions = {f.name: f for f in functions} def process_input(self, user_input: str) -> dict: schema_text = self._format_schemas() prompt = f"Functions:\n{schema_text}\nUser: {user_input}\nAction:" output = self.model.generate(prompt) if self._is_function_call(output): func_name, params = self._parse_function_call(output) return {"type": "function_call", "name": func_name, "params": params} return {"type": "text", "content": output} def _format_schemas(self) -> str: return "\n".join( f"- {f.name}: {f.description}" for f in self.functions.values() )
| Framework | Platform | Purpose |
|---|---|---|
| Google AI Edge Gallery | Android, iOS | Showcase and SDK for on-device models |
| FunctionGemma | Cross-platform | 270M parameter function-calling model |
| Gemini Nano | Android, Chrome | Built-in on-device inference |
| TensorFlow Lite / LiteRT | Android, iOS, embedded | Model deployment runtime |
| Core ML | iOS, macOS | Apple on-device ML framework |
| Qualcomm AI Engine | Android (Snapdragon) | Hardware-accelerated inference |