SOTOPIA is an open-ended simulation environment and benchmark for evaluating the social intelligence of AI agents through complex, multi-turn social interactions. Introduced by Zhou et al. (2023), it provides 90 procedurally generated social scenarios spanning cooperative, competitive, and mixed-motive settings, scored across seven sociological dimensions.1))
Traditional agent benchmarks focus on task completion in isolation. SOTOPIA addresses a critical gap: measuring how well agents navigate the nuanced social dynamics that characterize real human interaction.2)) Agents are placed in realistic social scenarios – negotiating, persuading, maintaining relationships – and evaluated holistically using SOTOPIA-EVAL.
Interactions are modeled as partially observable Markov decision processes (POMDPs), where each agent acts based on limited observations:
<latex>\pi(a_t | o_{1:t}, s_t, g)</latex>
where <latex>o_{1:t}</latex> are past observations, <latex>s_t</latex> is the agent's internal state, and <latex>g</latex> is the social goal.
SOTOPIA-EVAL scores agents across seven dimensions inspired by sociology, psychology, and economics:
| Dimension | Range | Description |
|---|---|---|
| Goal Completion (GOAL) | [0, 10] | Extent of achieving the primary social goal |
| Believability (BEL) | [0, 10] | Fidelity to assigned persona and character consistency |
| Knowledge (KNO) | [0, 10] | Effectiveness in acquiring relevant information |
| Secret (SEC) | [-10, 0] | Success in concealing private information |
| Relationship (REL) | [-5, 5] | Net social value created; relationship maintenance |
| Social Rules (SOC) | [-10, 0] | Adherence to social, legal, and ethical norms |
| Financial/Material (FIN) | [-5, 5] | Impact on tangible financial or material outcomes |
The overall score is computed as:
<latex>S_{\text{overall}} = \frac{1}{7} \sum_{d \in D} S_d</latex>
Scenarios are procedurally generated with automated character creation, relationship assignment, and goal specification. This enables scalable simulation across diverse social contexts.
# Evaluating an agent pair in SOTOPIA using the framework API from sotopia.envs import ParallelSotopiaEnv from sotopia.agents import LLMAgent env = ParallelSotopiaEnv(scenario_id="negotiation_01") agent1 = LLMAgent(model="gpt-4", persona="assertive_negotiator") agent2 = LLMAgent(model="gpt-4", persona="cooperative_partner") obs = env.reset() done = False while not done: actions = { "agent1": agent1.act(obs["agent1"]), "agent2": agent2.act(obs["agent2"]), } obs, rewards, done, info = env.step(actions) # Retrieve 7-dimension evaluation scores scores = env.evaluate() for dim, score in scores.items(): print(f"{dim}: {score:.2f}")