====== On-Device Agents ======

**On-Device Agents** are AI agent systems that run entirely on local hardware -- smartphones, tablets, PCs, and edge devices -- without requiring cloud connectivity for inference. By shifting computation to the device, on-device agents deliver ultra-low latency, privacy-by-design data handling, and offline reliability. Recent advances in small language models and specialized function-calling architectures have made agentic capabilities practical on mobile hardware.

===== Overview =====

Cloud-based AI agents require network round-trips for every inference call, adding latency, incurring per-request API costs, and sending potentially sensitive user data to external servers. On-device agents eliminate these constraints: the model runs on the local processor, functions in airplane mode, incurs no per-request charges, and processes sensitive data without it ever leaving the device.

The key challenge is fitting capable models -- especially those with function-calling abilities -- into the memory and compute constraints of mobile hardware while maintaining accuracy and battery efficiency.

===== Google AI Edge =====

**Google AI Edge** is Google's platform for deploying AI models on mobile and edge devices. The ecosystem includes:

=== Google AI Edge Gallery ===

A showcase application (available on Android and iOS) demonstrating on-device AI capabilities powered by Gemma and other open-weight models. In February 2026, Google announced major updates including on-device function calling support and cross-platform iOS availability.

=== Gemini Nano ===

The smallest model in Google's Gemini family, designed for on-device inference:

  * **Zero network latency** -- Near-instant responses with no server round-trips
  * **No API costs** -- All inference runs locally
  * **Privacy guarantees** -- User data never leaves the device
  * **Offline operation** -- Works in airplane mode, tunnels, and areas with poor connectivity
  * Available on Android devices and in Chrome browser via built-in APIs

=== FunctionGemma ===

Released December 2025, **FunctionGemma** is a specialized version of Gemma 3 (270M parameters) fine-tuned specifically for function calling. Key characteristics:

  * Translates natural language into executable API actions
  * Designed as a base for further training into custom, domain-specific agents
  * Can act as a fully independent agent for offline tasks or as an intelligent traffic controller routing to larger models
  * Lightweight enough to run on mobile hardware (270M parameters)

=== AI Edge Function Calling SDK ===

A library enabling developers to use function calling with on-device LLMs. The pipeline involves:

  - Define function declarations (names, parameters, types)
  - Format prompts for the LLM including function schemas
  - Parse LLM outputs to detect function calls
  - Execute detected function calls with appropriate parameters

===== On-Device Function Calling =====

The real power of on-device agents emerges when models can invoke functions -- opening apps, adjusting settings, creating calendar entries, or navigating to destinations. This transforms passive text generation into active device interaction.

<code python>
from dataclasses import dataclass

@dataclass
class FunctionDeclaration:
    name: str
    description: str
    parameters: dict

# Define available device functions
device_functions = [
    FunctionDeclaration(
        name="create_calendar_event",
        description="Create a new calendar event",
        parameters={
            "title": {"type": "string", "required": True},
            "datetime": {"type": "string", "format": "iso8601"},
            "duration_minutes": {"type": "integer", "default": 60}
        }
    ),
    FunctionDeclaration(
        name="set_alarm",
        description="Set a device alarm",
        parameters={
            "time": {"type": "string", "format": "HH:MM"},
            "label": {"type": "string", "default": "Alarm"}
        }
    ),
    FunctionDeclaration(
        name="navigate_to",
        description="Open navigation to a destination",
        parameters={
            "destination": {"type": "string", "required": True},
            "mode": {"type": "string", "enum": ["driving", "walking"]}
        }
    ),
]

class OnDeviceAgent:
    def __init__(self, model, functions):
        self.model = model
        self.functions = {f.name: f for f in functions}

    def process_input(self, user_input: str) -> dict:
        schema_text = self._format_schemas()
        prompt = f"Functions:\n{schema_text}\nUser: {user_input}\nAction:"
        output = self.model.generate(prompt)

        if self._is_function_call(output):
            func_name, params = self._parse_function_call(output)
            return {"type": "function_call", "name": func_name, "params": params}
        return {"type": "text", "content": output}

    def _format_schemas(self) -> str:
        return "\n".join(
            f"- {f.name}: {f.description}"
            for f in self.functions.values()
        )
</code>

===== Key Challenges =====

  * **Hardware Constraints** -- Models must be compressed (quantization, pruning, distillation) to fit mobile memory and compute budgets
  * **Model Updates** -- Managing model version deployment across diverse device fleets
  * **Platform Fragmentation** -- Tailoring solutions for Android, iOS, web, and embedded systems
  * **Accuracy vs Efficiency** -- Balancing model capability with battery life and thermal constraints
  * **Function Safety** -- Ensuring on-device function calls cannot be exploited for unauthorized device access

===== Frameworks and Tools =====

^ Framework ^ Platform ^ Purpose ^
| Google AI Edge Gallery | Android, iOS | Showcase and SDK for on-device models |
| FunctionGemma | Cross-platform | 270M parameter function-calling model |
| Gemini Nano | Android, Chrome | Built-in on-device inference |
| TensorFlow Lite / LiteRT | Android, iOS, embedded | Model deployment runtime |
| Core ML | iOS, macOS | Apple on-device ML framework |
| Qualcomm AI Engine | Android (Snapdragon) | Hardware-accelerated inference |

===== References =====

  * [[https://developers.googleblog.com/on-device-function-calling-in-google-ai-edge-gallery/|Google Developers Blog -- On-Device Function Calling in AI Edge Gallery]]
  * [[https://blog.google/technology/developers/functiongemma|Google Blog -- FunctionGemma for Function Calling]]
  * [[https://ai.google.dev/edge/mediapipe/solutions/genai/function_calling|Google AI Edge -- Function Calling Guide]]
  * [[https://gemilab.net/en/articles/gemini-dev/gemini-nano-on-device-ai-guide|Gemini Nano -- On-Device AI Guide]]

===== See Also =====

  * [[small_language_models|Small Language Models]]
  * [[tool_use|Tool Use in LLM Agents]]
  * [[edge_computing|Edge Computing for AI]]
  * [[agent_function_calling|Agent Function Calling]]