====== On-Device Agents ====== **On-Device Agents** are AI agent systems that run entirely on local hardware -- smartphones, tablets, PCs, and edge devices -- without requiring cloud connectivity for inference. By shifting computation to the device, on-device agents deliver ultra-low latency, privacy-by-design data handling, and offline reliability. Recent advances in small language models and specialized function-calling architectures have made agentic capabilities practical on mobile hardware. ===== Overview ===== Cloud-based AI agents require network round-trips for every inference call, adding latency, incurring per-request API costs, and sending potentially sensitive user data to external servers. On-device agents eliminate these constraints: the model runs on the local processor, functions in airplane mode, incurs no per-request charges, and processes sensitive data without it ever leaving the device. The key challenge is fitting capable models -- especially those with function-calling abilities -- into the memory and compute constraints of mobile hardware while maintaining accuracy and battery efficiency. ===== Google AI Edge ===== **Google AI Edge** is Google's platform for deploying AI models on mobile and edge devices. The ecosystem includes: === Google AI Edge Gallery === A showcase application (available on Android and iOS) demonstrating on-device AI capabilities powered by Gemma and other open-weight models. In February 2026, Google announced major updates including on-device function calling support and cross-platform iOS availability. === Gemini Nano === The smallest model in Google's Gemini family, designed for on-device inference: * **Zero network latency** -- Near-instant responses with no server round-trips * **No API costs** -- All inference runs locally * **Privacy guarantees** -- User data never leaves the device * **Offline operation** -- Works in airplane mode, tunnels, and areas with poor connectivity * Available on Android devices and in Chrome browser via built-in APIs === FunctionGemma === Released December 2025, **FunctionGemma** is a specialized version of Gemma 3 (270M parameters) fine-tuned specifically for function calling. Key characteristics: * Translates natural language into executable API actions * Designed as a base for further training into custom, domain-specific agents * Can act as a fully independent agent for offline tasks or as an intelligent traffic controller routing to larger models * Lightweight enough to run on mobile hardware (270M parameters) === AI Edge Function Calling SDK === A library enabling developers to use function calling with on-device LLMs. The pipeline involves: - Define function declarations (names, parameters, types) - Format prompts for the LLM including function schemas - Parse LLM outputs to detect function calls - Execute detected function calls with appropriate parameters ===== On-Device Function Calling ===== The real power of on-device agents emerges when models can invoke functions -- opening apps, adjusting settings, creating calendar entries, or navigating to destinations. This transforms passive text generation into active device interaction. from dataclasses import dataclass @dataclass class FunctionDeclaration: name: str description: str parameters: dict # Define available device functions device_functions = [ FunctionDeclaration( name="create_calendar_event", description="Create a new calendar event", parameters={ "title": {"type": "string", "required": True}, "datetime": {"type": "string", "format": "iso8601"}, "duration_minutes": {"type": "integer", "default": 60} } ), FunctionDeclaration( name="set_alarm", description="Set a device alarm", parameters={ "time": {"type": "string", "format": "HH:MM"}, "label": {"type": "string", "default": "Alarm"} } ), FunctionDeclaration( name="navigate_to", description="Open navigation to a destination", parameters={ "destination": {"type": "string", "required": True}, "mode": {"type": "string", "enum": ["driving", "walking"]} } ), ] class OnDeviceAgent: def __init__(self, model, functions): self.model = model self.functions = {f.name: f for f in functions} def process_input(self, user_input: str) -> dict: schema_text = self._format_schemas() prompt = f"Functions:\n{schema_text}\nUser: {user_input}\nAction:" output = self.model.generate(prompt) if self._is_function_call(output): func_name, params = self._parse_function_call(output) return {"type": "function_call", "name": func_name, "params": params} return {"type": "text", "content": output} def _format_schemas(self) -> str: return "\n".join( f"- {f.name}: {f.description}" for f in self.functions.values() ) ===== Key Challenges ===== * **Hardware Constraints** -- Models must be compressed (quantization, pruning, distillation) to fit mobile memory and compute budgets * **Model Updates** -- Managing model version deployment across diverse device fleets * **Platform Fragmentation** -- Tailoring solutions for Android, iOS, web, and embedded systems * **Accuracy vs Efficiency** -- Balancing model capability with battery life and thermal constraints * **Function Safety** -- Ensuring on-device function calls cannot be exploited for unauthorized device access ===== Frameworks and Tools ===== ^ Framework ^ Platform ^ Purpose ^ | Google AI Edge Gallery | Android, iOS | Showcase and SDK for on-device models | | FunctionGemma | Cross-platform | 270M parameter function-calling model | | Gemini Nano | Android, Chrome | Built-in on-device inference | | TensorFlow Lite / LiteRT | Android, iOS, embedded | Model deployment runtime | | Core ML | iOS, macOS | Apple on-device ML framework | | Qualcomm AI Engine | Android (Snapdragon) | Hardware-accelerated inference | ===== References ===== * [[https://developers.googleblog.com/on-device-function-calling-in-google-ai-edge-gallery/|Google Developers Blog -- On-Device Function Calling in AI Edge Gallery]] * [[https://blog.google/technology/developers/functiongemma|Google Blog -- FunctionGemma for Function Calling]] * [[https://ai.google.dev/edge/mediapipe/solutions/genai/function_calling|Google AI Edge -- Function Calling Guide]] * [[https://gemilab.net/en/articles/gemini-dev/gemini-nano-on-device-ai-guide|Gemini Nano -- On-Device AI Guide]] ===== See Also ===== * [[small_language_models|Small Language Models]] * [[tool_use|Tool Use in LLM Agents]] * [[edge_computing|Edge Computing for AI]] * [[agent_function_calling|Agent Function Calling]]