====== Agent Personalization ======

Agent personalization enables LLM-powered agents to learn user preferences over time, maintaining persistent profiles that adapt communication style, decision-making, and tool use to individual users. Rather than treating every interaction as stateless, personalized agents build cumulative models of user behavior, preferences, and goals across sessions.

===== The Personalization Gap =====

Most LLM agents today are stateless --- they forget everything between sessions. Users must repeatedly restate preferences, correct communication styles, and re-explain context. This creates friction that limits agent adoption for long-term use cases like personal assistants, healthcare companions, and productivity tools.

Personalized agents address this through four interdependent capabilities:
  * **Profile modeling:** Inferring user traits, preferences, and goals from interactions
  * **Memory:** Persisting relevant information across sessions
  * **Planning:** Adapting task decomposition to user patterns
  * **Action execution:** Tailoring outputs to user communication preferences

===== PersonaMem =====

PersonaMem-v2 (University of Pennsylvania, 2025) is the state-of-the-art dataset for LLM personalization research. It simulates 1,000 realistic user-chatbot interactions across 300+ scenarios with 20,000+ user preferences and 128k-token context windows. Critically, most preferences are //implicitly// revealed --- users do not explicitly state "I prefer formal language" but reveal it through interaction patterns, mirroring real-world behavior.

The dataset enables evaluation of reinforcement fine-tuning for long-context reasoning about user understanding and preference extraction.

===== VARS: Vector-Adapted Retrieval Scoring =====

VARS (UIUC, March 2026) is a pipeline-agnostic, frozen-backbone framework for personalization without per-user fine-tuning. Each user is represented by long-term and short-term vectors in a shared preference space:

$$\mathbf{s}_u = lpha \cdot \mathbf{v}_{long} + (1 - lpha) \cdot \mathbf{v}_{short}$$

These vectors bias retrieval scoring over structured preference memory and are updated online from weak scalar rewards (e.g., thumbs up/down). On the MultiSessionCollab benchmark, VARS reduces timeout rates and user effort while matching strong baselines in task success --- the key benefit is //interaction efficiency// rather than raw accuracy gains.

<code python>
# Simplified VARS-style user preference scoring
import numpy as np

class UserPreferenceModel:
    def __init__(self, dim=768, alpha=0.7, lr=0.01):
        self.v_long = np.zeros(dim)   # long-term preference vector
        self.v_short = np.zeros(dim)  # short-term session vector
        self.alpha = alpha
        self.lr = lr

    def score(self, candidates):
        user_vec = self.alpha * self.v_long + (1 - self.alpha) * self.v_short
        return np.array([np.dot(user_vec, c) for c in candidates])

    def update(self, chosen_embed, reward):
        self.v_short += self.lr * reward * chosen_embed
        self.v_long = 0.99 * self.v_long + 0.01 * self.v_short

    def retrieve_personalized(self, query_results, top_k=5):
        embeddings = [r["embedding"] for r in query_results]
        scores = self.score(embeddings)
        ranked = sorted(zip(scores, query_results), reverse=True)
        return [item for _, item in ranked[:top_k]]
</code>

===== PersonaAgent =====

PersonaAgent (Amazon/UIC, 2025) is the first personalized LLM agent framework for multi-turn, long-horizon alignment. It integrates two modules:

  * **Personalized memory:** Combines episodic memory (specific past interactions) and semantic memory (generalized user knowledge) to build a coherent user model
  * **Personalized action:** Enables tool use tailored to the user preferences and history

At the core, a //persona// --- a unique system prompt per user --- functions as an intermediary: it leverages memory insights to control agent actions, while action outcomes refine the persona over time.

===== Preference Learning Approaches =====

| **Approach** | **Mechanism** | **Pros** | **Cons** |
| Explicit feedback | Thumbs up/down, ratings | Clear signal | User fatigue, sparse |
| Implicit signals | Click patterns, dwell time, edits | Rich, continuous | Noisy, indirect |
| Reinforcement fine-tuning | RLHF/DPO on user data | Deep adaptation | Compute-heavy, per-user cost |
| Frozen-backbone (VARS) | Vector updates, no fine-tuning | Scalable, instant | Limited expressiveness |
| Persona prompting | Dynamic system prompt | Zero-cost, flexible | Context window limits |

===== Communication Style Adaptation =====

Effective personalization extends beyond content to //how// agents communicate:

  * **Formality level:** Adjusting between casual and professional registers
  * **Verbosity:** Matching user preference for concise vs. detailed responses
  * **Proactivity:** Learning when users want suggestions vs. waiting for instructions
  * **Domain vocabulary:** Adopting user-specific terminology and jargon
  * **Emotional tone:** Calibrating empathy and encouragement levels

These adaptations are typically captured as lightweight persona parameters updated from interaction signals, stored alongside factual preferences in the user profile.

===== References =====

  * [[https://arxiv.org/abs/2512.06688|PersonaMem-v2: Personalized Intelligence via Implicit User Personas (UPenn, 2025)]]
  * [[https://arxiv.org/abs/2603.20939|User Preference Modeling via VARS (UIUC, 2026)]]
  * [[https://arxiv.org/abs/2602.22680|Toward Personalized LLM-Powered Agents: Foundations and Evaluation (ShanghaiTech, 2026)]]
  * [[https://openreview.net/forum?id=PersonaAgent|PersonaAgent: LLM Agents Meet Personalization at Test Time (Amazon/UIC, 2025)]]
  * [[https://arxiv.org/abs/2510.07925|Persistent Memory and User Profiles for LLM Agents (Mercedes-Benz/Ulm, 2025)]]

===== See Also =====

  * [[collective_agent_behavior]]
  * [[small_language_model_agents]]
  * [[agent_cost_optimization]]