Agent Personalization

Agent personalization enables LLM-powered agents to learn user preferences over time, maintaining persistent profiles that adapt communication style, decision-making, and tool use to individual users. Rather than treating every interaction as stateless, personalized agents build cumulative models of user behavior, preferences, and goals across sessions.

The Personalization Gap

Most LLM agents today are stateless — they forget everything between sessions. Users must repeatedly restate preferences, correct communication styles, and re-explain context. This creates friction that limits agent adoption for long-term use cases like personal assistants, healthcare companions, and productivity tools.

Personalized agents address this through four interdependent capabilities:

Profile modeling: Inferring user traits, preferences, and goals from interactions
Memory: Persisting relevant information across sessions
Planning: Adapting task decomposition to user patterns
Action execution: Tailoring outputs to user communication preferences

PersonaMem

PersonaMem-v2 (University of Pennsylvania, 2025) is the state-of-the-art dataset for LLM personalization research. It simulates 1,000 realistic user-chatbot interactions across 300+ scenarios with 20,000+ user preferences and 128k-token context windows. Critically, most preferences are implicitly revealed — users do not explicitly state “I prefer formal language” but reveal it through interaction patterns, mirroring real-world behavior.

The dataset enables evaluation of reinforcement fine-tuning for long-context reasoning about user understanding and preference extraction.

VARS: Vector-Adapted Retrieval Scoring

VARS (UIUC, March 2026) is a pipeline-agnostic, frozen-backbone framework for personalization without per-user fine-tuning. Each user is represented by long-term and short-term vectors in a shared preference space:

$$\mathbf{s}_u = lpha \cdot \mathbf{v}_{long} + (1 - lpha) \cdot \mathbf{v}_{short}$$

These vectors bias retrieval scoring over structured preference memory and are updated online from weak scalar rewards (e.g., thumbs up/down). On the MultiSessionCollab benchmark, VARS reduces timeout rates and user effort while matching strong baselines in task success — the key benefit is interaction efficiency rather than raw accuracy gains.

# Simplified VARS-style user preference scoring
import numpy as np
 
class UserPreferenceModel:
    def __init__(self, dim=768, alpha=0.7, lr=0.01):
        self.v_long = np.zeros(dim)   # long-term preference vector
        self.v_short = np.zeros(dim)  # short-term session vector
        self.alpha = alpha
        self.lr = lr
 
    def score(self, candidates):
        user_vec = self.alpha * self.v_long + (1 - self.alpha) * self.v_short
        return np.array([np.dot(user_vec, c) for c in candidates])
 
    def update(self, chosen_embed, reward):
        self.v_short += self.lr * reward * chosen_embed
        self.v_long = 0.99 * self.v_long + 0.01 * self.v_short
 
    def retrieve_personalized(self, query_results, top_k=5):
        embeddings = [r["embedding"] for r in query_results]
        scores = self.score(embeddings)
        ranked = sorted(zip(scores, query_results), reverse=True)
        return [item for _, item in ranked[:top_k]]

PersonaAgent

PersonaAgent (Amazon/UIC, 2025) is the first personalized LLM agent framework for multi-turn, long-horizon alignment. It integrates two modules:

Personalized memory: Combines episodic memory (specific past interactions) and semantic memory (generalized user knowledge) to build a coherent user model
Personalized action: Enables tool use tailored to the user preferences and history

At the core, a persona — a unique system prompt per user — functions as an intermediary: it leverages memory insights to control agent actions, while action outcomes refine the persona over time.

Preference Learning Approaches

Approach	Mechanism	Pros	Cons
Explicit feedback	Thumbs up/down, ratings	Clear signal	User fatigue, sparse
Implicit signals	Click patterns, dwell time, edits	Rich, continuous	Noisy, indirect
Reinforcement fine-tuning	RLHF/DPO on user data	Deep adaptation	Compute-heavy, per-user cost
Frozen-backbone (VARS)	Vector updates, no fine-tuning	Scalable, instant	Limited expressiveness
Persona prompting	Dynamic system prompt	Zero-cost, flexible	Context window limits

Communication Style Adaptation

Effective personalization extends beyond content to how agents communicate:

Formality level: Adjusting between casual and professional registers
Verbosity: Matching user preference for concise vs. detailed responses
Proactivity: Learning when users want suggestions vs. waiting for instructions
Domain vocabulary: Adopting user-specific terminology and jargon
Emotional tone: Calibrating empathy and encouragement levels

These adaptations are typically captured as lightweight persona parameters updated from interaction signals, stored alongside factual preferences in the user profile.

Table of Contents