====== Personalized Agents from Human Feedback ======
Personalized Agents from Human Feedback (PAHF) is a framework introduced by Liang et al. from Meta Superintelligence Labs and Princeton University (arXiv:2602.16173) for continual personalization of AI agents through online learning from live human interaction. PAHF addresses a fundamental limitation of current agents: they are powerful but fail to align with the idiosyncratic, evolving preferences of individual users. The framework operationalizes a three-step interaction loop with explicit per-user memory, enabling agents to learn initial preferences from scratch and rapidly adapt to preference shifts without relying on static datasets.
===== The Personalization Gap =====
Modern AI agents optimize for average user preferences through RLHF and instruction tuning, but individual users have unique, evolving needs. Prior approaches face two key limitations:
* **Static implicit models**: Train preference models on historical interaction data, but struggle with new users (cold start) and preference drift
* **External memory profiles**: Encode user preferences in retrieval systems, but lack mechanisms for systematic learning and adaptation
PAHF bridges this gap with an online continual learning framework that treats each interaction as a learning opportunity.
===== The Three-Step PAHF Loop =====
PAHF operationalizes personalization through a continuous three-step interaction loop:
**1. Pre-Action Clarification**: Before taking action, the agent proactively asks questions to resolve ambiguities in user preferences. This prevents errors from partial observability and accelerates initial learning.
**2. Preference-Grounded Actions**: The agent selects actions by retrieving stored preferences from explicit per-user memory and grounding its decisions in those preferences.
**3. Post-Action Feedback**: After acting, the agent integrates human corrections and reactions to update memory, handling preference drift and correcting miscalibrated beliefs.
# Illustration of the PAHF three-step interaction loop
class PAHFAgent:
def __init__(self, base_model, memory_store):
self.model = base_model
self.memory = memory_store # per-user explicit memory
def interact(self, user_id: str, task: dict) -> dict:
user_prefs = self.memory.retrieve(user_id)
# Step 1: Pre-action clarification
if self.has_ambiguity(task, user_prefs):
clarification = self.model.generate_question(task, user_prefs)
user_response = self.get_user_input(clarification)
self.memory.update(user_id, self.extract_prefs(user_response))
user_prefs = self.memory.retrieve(user_id)
# Step 2: Preference-grounded action
action = self.model.select_action(
task=task,
preferences=user_prefs,
strategy="preference_grounded"
)
result = self.execute(action)
# Step 3: Post-action feedback integration
feedback = self.get_user_feedback(result)
if feedback.has_correction:
self.memory.update(user_id, feedback.new_preferences)
return result
===== Explicit Per-User Memory =====
PAHF maintains a dynamic, user-specific store of preferences that is:
* **Explicit**: Preferences are stored as interpretable key-value pairs rather than implicit neural representations
* **Persistent**: Memory survives across sessions, enabling long-term personalization
* **Updatable**: New feedback overwrites or refines existing preferences in real-time
* **Retrievable**: Relevant preferences are retrieved at action time via similarity matching
This design enables rapid adaptation to new users (no cold start with historical data required) and graceful handling of preference shifts.
===== Four-Phase Evaluation Protocol =====
PAHF introduces a rigorous four-phase evaluation protocol that tests both initial learning and adaptation:
^ Phase ^ Description ^ What it Tests ^
| Phase 1 | Learn initial preferences from scratch via feedback | Cold-start learning ability |
| Phase 2 | Exploit learned preferences without additional feedback | Preference retention and grounding |
| Phase 3 | Adapt to preference/persona shifts | Online adaptation speed |
| Phase 4 | Demonstrate post-shift exploitation | Updated preference stability |
===== Formal Learning Dynamics =====
The personalization error under PAHF decreases as a function of interaction rounds:
ext{ACPE}(T) = \sum_{t=1}^{T} \| \hat{p}_t - p_t^* \|^2
where $\hat{p}_t$ is the agent's predicted preference at time $t$ and $p_t^*$ is the true user preference. PAHF with dual feedback channels (pre-action + post-action) achieves lower ACPE than either channel alone:
ext{ACPE}_{ ext{dual}} \leq \min( ext{ACPE}_{ ext{pre-only}}, ext{ACPE}_{ ext{post-only}})
Pre-action clarification minimizes initial errors by resolving ambiguity upfront, while post-action feedback enables fast correction when predictions are wrong.
===== Benchmarks and Results =====
PAHF is evaluated on two large-scale benchmarks:
* **Embodied Manipulation**: Robot tasks in household scenarios involving appliance and object preferences
* **Online Shopping**: E-commerce tasks simulating digital personalization across product categories
Results across both benchmarks show PAHF consistently outperforms baselines:
* **No-memory baselines**: Hallucinate preferences or default to population averages, failing on individual users
* **Single-channel (pre-action only)**: Cannot correct errors after action execution
* **Single-channel (post-action only)**: Slower initial learning due to trial-and-error without upfront clarification
* **PAHF (dual channel)**: Achieves highest success rates and lowest ACPE across all four evaluation phases
Post-drift adaptation (Phase 3) is particularly strong: PAHF matches or exceeds single-channel adaptation speed while maintaining lower cumulative error throughout the process.
===== References =====
* [[https://arxiv.org/abs/2602.16173|Liang et al., "Learning Personalized Agents from Human Feedback," arXiv:2602.16173, 2026]]
* [[https://ai.meta.com/research/publications/learning-personalized-agents-from-human-feedback/|Meta AI Research Publication Page]]
* [[https://github.com/facebookresearch/PAHF|PAHF GitHub Repository (Meta Research)]]
* [[https://personalized-ai.github.io|PAHF Project Page]]
===== See Also =====
* [[instruction_following_evaluation|Instruction Following Evaluation (IF-CRITIC)]]
* [[world_of_workflows_benchmark|World of Workflows Benchmark]]
* [[theory_of_code_space|Theory of Code Space (ToCS)]]