Differences

This shows you the differences between two versions of the page.

--- video_editing_agents [2026/03/25 14:52] – Create page: LLM agents for video editing agent
+++ video_editing_agents [2026/03/30 22:39] (current) – Restructure: footnotes as references agent
@@ Line 1: / Line 1: @@
 ====== Video Editing Agents ======
-LLM-powered agents for video editing enable prompt-driven autonomous editing workflows, transforming natural language instructions into structured edit operations over long-form video content through hierarchical semantic indexing and agentic planning.
+LLM-powered agents for video editing enable prompt-driven autonomous editing workflows, transforming natural language instructions into structured edit operations over long-form video content through hierarchical semantic indexing and agentic planning.(([[https://arxiv.org/abs/2509.16811|"Prompt-Driven Agentic Video Editing with Hierarchical Semantic Indexing" (2025)]]))
 ===== Overview =====
@@ Line 9: / Line 9: @@
 ===== Prompt-Driven Agentic Video Editing =====
-The framework introduced in the prompt-driven agentic editing paper uses a modular, cloud-native pipeline for long-form video comprehension and editing:
+The framework introduced in the prompt-driven agentic editing paper uses a modular, cloud-native pipeline for long-form video comprehension and editing:(([[https://arxiv.org/abs/2509.16811|"Prompt-Driven Agentic Video Editing with Hierarchical Semantic Indexing" (2025)]]))
   * **Ingestion Module**: Processes raw video into analyzable segments
@@ Line 24: / Line 24: @@
 ===== LAVE: Agent-Assisted Video Editing =====
-LAVE (LLM Agent-assisted Video Editing) implements a semi-autonomous workflow where the agent collaborates with the user:
+LAVE (LLM Agent-assisted Video Editing) implements a semi-autonomous workflow where the agent collaborates with the user:(([[https://arxiv.org/abs/2402.10294|Wang et al. "LAVE: LLM-Powered Agent-Assisted Video Editing" (2024)]]))
 **Backend Processing**: Video frames are sampled every second, captioned using VLMs (e.g., LLaVA), then processed by GPT-4 to generate titles, summaries, and unique clip IDs, converting visual content to text for LLM processing.
@@ Line 32: / Line 32: @@
   - **Execute State**: Agent performs approved actions sequentially, presenting results for user refinement
-A user study with 8 participants (novices to experts) demonstrated LAVE produces satisfactory videos rated as easy to use and useful, enhancing creativity and the sense of co-creation.
+A user study with 8 participants (novices to experts) demonstrated LAVE produces satisfactory videos rated as easy to use and useful, enhancing creativity and the sense of co-creation.(([[https://arxiv.org/abs/2402.10294|Wang et al. "LAVE: LLM-Powered Agent-Assisted Video Editing" (2024)]]))
 ===== Story-Driven Editing =====
@@ Line 133: / Line 133: @@
 | LAVE | Semi-autonomous (user approves) | Brainstorming + storyboarding | 8 participants, positive |
 | VideoAgent | Agentic framework | Understanding + editing | General performance |
-===== References =====
-  * [[https://arxiv.org/abs/2509.16811|"Prompt-Driven Agentic Video Editing with Hierarchical Semantic Indexing" (2025)]]
-  * [[https://arxiv.org/abs/2402.10294|Wang et al. "LAVE: LLM-Powered Agent-Assisted Video Editing" (2024)]]
 ===== See Also =====
@@ Line 144: / Line 139: @@
   * [[music_composition_agents|Music Composition Agents]]
   * [[game_playing_agents|Game Playing Agents]]
+===== References =====

AI Agent Knowledge Base

User Tools

Site Tools

Differences

Page Tools