On-Device Agents

On-Device Agents are AI agent systems that run entirely on local hardware – smartphones, tablets, PCs, and edge devices – without requiring cloud connectivity for inference. By shifting computation to the device, on-device agents deliver ultra-low latency, privacy-by-design data handling, and offline reliability. Recent advances in small language models and specialized function-calling architectures have made agentic capabilities practical on mobile hardware.

Overview

Cloud-based AI agents require network round-trips for every inference call, adding latency, incurring per-request API costs, and sending potentially sensitive user data to external servers. On-device agents eliminate these constraints: the model runs on the local processor, functions in airplane mode, incurs no per-request charges, and processes sensitive data without it ever leaving the device.

The key challenge is fitting capable models – especially those with function-calling abilities – into the memory and compute constraints of mobile hardware while maintaining accuracy and battery efficiency.

Google AI Edge

Google AI Edge is Google's platform for deploying AI models on mobile and edge devices. The ecosystem includes:

Google AI Edge Gallery

A showcase application (available on Android and iOS) demonstrating on-device AI capabilities powered by Gemma and other open-weight models. In February 2026, Google announced major updates including on-device function calling support and cross-platform iOS availability.

Gemini Nano

The smallest model in Google's Gemini family, designed for on-device inference:

Zero network latency – Near-instant responses with no server round-trips
No API costs – All inference runs locally
Privacy guarantees – User data never leaves the device
Offline operation – Works in airplane mode, tunnels, and areas with poor connectivity
Available on Android devices and in Chrome browser via built-in APIs

FunctionGemma

Released December 2025, FunctionGemma is a specialized version of Gemma 3 (270M parameters) fine-tuned specifically for function calling. Key characteristics:

Translates natural language into executable API actions
Designed as a base for further training into custom, domain-specific agents
Can act as a fully independent agent for offline tasks or as an intelligent traffic controller routing to larger models
Lightweight enough to run on mobile hardware (270M parameters)

AI Edge Function Calling SDK

A library enabling developers to use function calling with on-device LLMs. The pipeline involves:

Define function declarations (names, parameters, types)
Format prompts for the LLM including function schemas
Parse LLM outputs to detect function calls
Execute detected function calls with appropriate parameters

On-Device Function Calling

The real power of on-device agents emerges when models can invoke functions – opening apps, adjusting settings, creating calendar entries, or navigating to destinations. This transforms passive text generation into active device interaction.

from dataclasses import dataclass
 
@dataclass
class FunctionDeclaration:
    name: str
    description: str
    parameters: dict
 
# Define available device functions
device_functions = [
    FunctionDeclaration(
        name="create_calendar_event",
        description="Create a new calendar event",
        parameters={
            "title": {"type": "string", "required": True},
            "datetime": {"type": "string", "format": "iso8601"},
            "duration_minutes": {"type": "integer", "default": 60}
        }
    ),
    FunctionDeclaration(
        name="set_alarm",
        description="Set a device alarm",
        parameters={
            "time": {"type": "string", "format": "HH:MM"},
            "label": {"type": "string", "default": "Alarm"}
        }
    ),
    FunctionDeclaration(
        name="navigate_to",
        description="Open navigation to a destination",
        parameters={
            "destination": {"type": "string", "required": True},
            "mode": {"type": "string", "enum": ["driving", "walking"]}
        }
    ),
]
 
class OnDeviceAgent:
    def __init__(self, model, functions):
        self.model = model
        self.functions = {f.name: f for f in functions}
 
    def process_input(self, user_input: str) -> dict:
        schema_text = self._format_schemas()
        prompt = f"Functions:\n{schema_text}\nUser: {user_input}\nAction:"
        output = self.model.generate(prompt)
 
        if self._is_function_call(output):
            func_name, params = self._parse_function_call(output)
            return {"type": "function_call", "name": func_name, "params": params}
        return {"type": "text", "content": output}
 
    def _format_schemas(self) -> str:
        return "\n".join(
            f"- {f.name}: {f.description}"
            for f in self.functions.values()
        )

Key Challenges

Hardware Constraints – Models must be compressed (quantization, pruning, distillation) to fit mobile memory and compute budgets
Model Updates – Managing model version deployment across diverse device fleets
Platform Fragmentation – Tailoring solutions for Android, iOS, web, and embedded systems
Accuracy vs Efficiency – Balancing model capability with battery life and thermal constraints
Function Safety – Ensuring on-device function calls cannot be exploited for unauthorized device access

Frameworks and Tools

Framework	Platform	Purpose
Google AI Edge Gallery	Android, iOS	Showcase and SDK for on-device models
FunctionGemma	Cross-platform	270M parameter function-calling model
Gemini Nano	Android, Chrome	Built-in on-device inference
TensorFlow Lite / LiteRT	Android, iOS, embedded	Model deployment runtime
Core ML	iOS, macOS	Apple on-device ML framework
Qualcomm AI Engine	Android (Snapdragon)	Hardware-accelerated inference

Table of Contents