====== ChemCrow: Augmenting LLMs with Chemistry Tools ====== **ChemCrow** is an LLM-powered chemistry agent introduced by Bran et al. (2023) that augments GPT-4 with **18 expert-designed chemistry tools** to autonomously perform tasks in organic synthesis, drug discovery, and materials design(([[https://arxiv.org/abs/2304.05376|Bran et al. "ChemCrow: Augmenting large-language models with chemistry tools" (2023]])). With **795 citations**, it demonstrates that domain-specific tool augmentation enables LLMs to bridge the gap between computational and experimental chemistry, including executing synthesis on robotic platforms. [[https://arxiv.org/abs/2304.05376|arXiv:2304.05376]] ===== The 18 Chemistry Expert Tools ===== ChemCrow integrates tools across four categories: ==== General Tools ==== * **WebSearch**: Query the web for chemical information * **LitSearch**: Search scientific literature databases * **Python REPL**: Execute computational chemistry code * **HumanInput**: Query a human expert when uncertain ==== Molecular Analysis and Design ==== * **Name2SMILES / CAS2SMILES / SMILES2Name**: Chemical name and identifier conversions * **FuncGroups**: Identify functional groups in molecules * **Similarity**: Compute molecular similarity scores using Tanimoto coefficient * **SMILES2Weight**: Calculate molecular weight from SMILES representation * **ModifyMol**: Generate structural modifications for retro/forward synthesis ==== Safety and Compliance ==== * **ChemicalWeaponCheck**: Screen compounds against chemical weapons databases (CAS-based) * **ExplosiveCheck**: Detect explosive properties via PubChem * **PatentChecker**: Verify patent status of compounds ==== Reaction and Synthesis ==== * **RXNPredict**: Predict reaction products using RXN4Chemistry * **RXNPlanner**: Generate multi-step synthetic routes with conditions and solvents * **NamedRxnFinder**: Identify named reactions in transformations ===== ReAct Reasoning Loop ===== ChemCrow employs the ReAct framework for iterative reasoning(([[https://arxiv.org/abs/2210.03629|Yao et al. "ReAct: Synergizing Reasoning and Acting in Language Models" (2022]])). At each step $t$, the agent generates: $$a_t = \text{LLM}(\text{Thought}_t, \text{Action}_t, \text{Input}_t \mid h_{ graph TD A[User Chemistry Query] --> B[GPT-4 Reasoning Engine] B --> C{Select Tool} C --> D[General Tools] C --> E[Molecular Analysis] C --> F[Safety Checks] C --> G[Reaction and Synthesis] D --> H[Observation] E --> H F --> H G --> H H --> I{Safety Verified?} I, No --> J[Halt: Safety Violation] I, Yes --> K{Task Complete?} K, No --> B K, Yes --> L[Final Answer with Citations] G --> M[RXN4Chemistry API] G --> N[RoboRXN Robotic Platform] E --> O[PubChem Database] ===== Code Example ===== # Simplified ChemCrow agent with tool selection TOOLS = { "Name2SMILES": lambda name: pubchem_lookup(name, "smiles"), "RXNPredict": lambda rxn: rxn4chemistry_predict(rxn), "RXNPlanner": lambda target: rxn4chemistry_plan(target), "ChemicalWeaponCheck": lambda cas: check_cw_database(cas), "ExplosiveCheck": lambda smiles: check_explosive(smiles), "Similarity": lambda pair: tanimoto_similarity(*pair), "FuncGroups": lambda smiles: identify_groups(smiles), } def chemcrow_run(query, llm, max_steps=15): history = [] for step in range(max_steps): thought, action, action_input = llm.reason(query, history) # Safety gate: always check before synthesis if action in ("RXNPlanner", "RXNPredict"): smiles = TOOLS["Name2SMILES"](action_input) if not TOOLS["ChemicalWeaponCheck"](smiles): return "Halted: safety violation detected" if not TOOLS["ExplosiveCheck"](smiles): return "Halted: explosive compound detected" observation = TOOLSaction(action_input) history.append((thought, action, observation)) if is_final_answer(thought): return compile_answer(history) ===== Demonstrated Capabilities ===== * **Autonomous synthesis**: Successfully planned and executed synthesis of an insect repellent and three organocatalysts * **Drug discovery**: Guided novel chromophore discovery with target optical properties * **Safety enforcement**: Proactive screening against chemical weapons and explosives databases * **Robotic execution**: Integrated with IBM RoboRXN for automated wet-lab synthesis(([[https://rxn.res.ibm.com|IBM RXN for Chemistry Platform]])) * Outperforms vanilla GPT-4 on chemistry-specific tasks per expert evaluation ===== See Also ===== * [[data_science_agents|Data Science Agents: DatawiseAgent]] * [[tool_use|Tool Use for LLM Agents]] * [[agenttuning|AgentTuning: Enabling Generalized Agent Capabilities in LLMs]] ===== References =====