====== Personalized Agents from Human Feedback ======
Personalized Agents from Human Feedback (PAHF) is a framework introduced by Liang et al. from [[meta|Meta]] Superintelligence Labs and Princeton University (arXiv:2602.16173) for continual personalization of AI agents through online learning from live human interaction.((https://arxiv.org/abs/2602.16173|Liang et al., "Learning Personalized Agents from Human Feedback," arXiv:2602.16173, 2026)) PAHF addresses a fundamental limitation of current agents: they are powerful but fail to align with the idiosyncratic, evolving preferences of individual users. The framework operationalizes a three-step interaction loop with explicit per-user memory, enabling agents to learn initial preferences from scratch and rapidly adapt to preference shifts without relying on static datasets.

===== The Personalization Gap =====
Modern AI agents optimize for average user preferences through RLHF and instruction tuning, but individual users have unique, evolving needs. Prior approaches face two key limitations:((https://arxiv.org/abs/2602.16173|Liang et al., "Learning Personalized Agents from Human Feedback," arXiv:2602.16173, 2026))

  * **Static implicit models**: Train preference models on historical interaction data, but struggle with new users (cold start) and preference drift
  * **External memory profiles**: Encode user preferences in retrieval systems, but lack mechanisms for systematic learning and adaptation

PAHF bridges this gap with an online continual learning framework that treats each interaction as a learning opportunity.

===== The Three-Step PAHF Loop =====
PAHF operationalizes personalization through a continuous three-step interaction loop:((https://ai.meta.com/research/publications/learning-personalized-agents-from-human-feedback/|Meta AI Research Publication Page))

**1. Pre-Action Clarification**: Before taking action, the agent proactively asks questions to resolve ambiguities in user preferences. This prevents errors from partial observability and accelerates initial learning.

**2. Preference-Grounded Actions**: The agent selects actions by retrieving stored preferences from explicit per-user memory and grounding its decisions in those preferences.

**3. Post-Action Feedback**: After acting, the agent integrates human corrections and reactions to update memory, handling preference drift and correcting miscalibrated beliefs.

<code python>
# Illustration of the PAHF three-step interaction loop
class PAHFAgent:
    def __init__(self, base_model, memory_store):
        self.model = base_model
        self.memory = memory_store  # per-user [[explicit_memory|explicit memory]]

    def interact(self, user_id: str, task: dict) -> dict:
        user_prefs = self.memory.retrieve(user_id)

        # Step 1: Pre-action clarification
        if self.has_ambiguity(task, user_prefs):
            clarification = self.model.generate_question(task, user_prefs)
            user_response = self.get_user_input(clarification)
            self.memory.update(user_id, self.extract_prefs(user_response))
            user_prefs = self.memory.retrieve(user_id)

        # Step 2: Preference-grounded action
        action = self.model.select_action(
            task=task,
            preferences=user_prefs,
            strategy="preference_grounded"
        )
        result = self.execute(action)

        # Step 3: Post-action feedback integration
        feedback = self.get_user_feedback(result)
        if feedback.has_correction:
            self.memory.update(user_id, feedback.new_preferences)

        return result
</code>

===== Explicit Per-User Memory =====
PAHF maintains a dynamic, user-specific store of preferences that is:
  * **Explicit**: Preferences are stored as interpretable key-value pairs rather than implicit neural representations
  * **Persistent**: Memory survives across sessions, enabling long-term personalization
  * **Updatable**: New feedback overwrites or refines existing preferences in real-time
  * **Retrievable**: Relevant preferences are retrieved at action time via similarity matching

This design enables rapid adaptation to new users (no cold start with historical data required) and graceful handling of preference shifts.((https://github.com/facebookresearch/PAHF|PAHF GitHub Repository (Meta Research)))

===== Four-Phase Evaluation Protocol =====
PAHF introduces a rigorous four-phase evaluation protocol that tests both initial learning and adaptation:((https://arxiv.org/abs/2602.16173|Liang et al., "Learning Personalized Agents from Human Feedback," arXiv:2602.16173, 2026))

^ Phase ^ Description ^ What it Tests ^
| Phase 1 | Learn initial preferences from scratch via feedback | Cold-start learning ability |
| Phase 2 | Exploit learned preferences without additional feedback | Preference retention and grounding |
| Phase 3 | Adapt to preference/[[persona_verification|persona]] shifts | Online adaptation speed |
| Phase 4 | Demonstrate post-shift exploitation | Updated preference stability |

===== Formal Learning Dynamics =====
The personalization error under PAHF decreases as a function of interaction rounds:

<latex>
	ext{ACPE}(T) = \sum_{t=1}^{T} \| \hat{p}_t - p_t^* \|^2
</latex>

where $\hat{p}_t$ is the agent's predicted preference at time $t$ and $p_t^*$ is the true user preference. PAHF with dual feedback channels (pre-action + post-action) achieves lower ACPE than either channel alone:

<latex>
	ext{ACPE}_{	ext{dual}} \leq \min(	ext{ACPE}_{	ext{pre-only}}, 	ext{ACPE}_{	ext{post-only}})
</latex>

Pre-action clarification minimizes initial errors by resolving ambiguity upfront, while post-action feedback enables fast correction when predictions are wrong.

===== Benchmarks and Results =====
PAHF is evaluated on two large-scale benchmarks:((https://personalized-ai.[[github|github]].io|PAHF Project Page))

  * **Embodied Manipulation**: Robot tasks in household scenarios involving appliance and object preferences
  * **Online Shopping**: E-commerce tasks simulating digital personalization across product categories

Results across both benchmarks show PAHF consistently outperforms baselines:
  * **No-memory baselines**: Hallucinate preferences or default to population averages, failing on individual users
  * **Single-channel (pre-action only)**: Cannot correct errors after action execution
  * **Single-channel (post-action only)**: Slower initial learning due to trial-and-error without upfront clarification
  * **PAHF (dual channel)**: Achieves highest success rates and lowest ACPE across all four evaluation phases

Post-drift adaptation (Phase 3) is particularly strong: PAHF matches or exceeds single-channel adaptation speed while maintaining lower cumulative error throughout the process.

===== See Also =====
  * [[personal_ai_agents|Personal AI Agents]]
  * [[agent_personalization|Agent Personalization]]
  * [[recommendation_agents|Recommendation Agents: AgentRecBench and ARAG]]
  * [[rlhf|Reinforcement Learning from Human Feedback]]

===== References =====