How to Write a Deep Research Workflow

This guide shows you how to build a multi-step deep research workflow using cyberian, based on modern deep research agent patterns.

cyberian's research workflows implement the Ralph Wiggum Pattern - an autonomous iteration technique where the agent loops until completion, with state persisted in files (PLAN.md, REPORT.md, citations/) rather than relying on conversation history.

What is Deep Research?

According to recent research, true deep research agents are autonomous AI systems with sophisticated capabilities:

Dynamic reasoning & planning - Adaptive long-horizon planning that adjusts based on findings
Multi-hop information retrieval - Following citation chains, synthesizing across multiple sources
Browser-based exploration - Not just API calls, but autonomous web interaction
Code execution & tool use - Running analyses, processing data, invoking specialized tools
Structured synthesis - Producing organized, well-cited reports with transparent reasoning

As of 2025, systems like OpenAI's Deep Research, Google's Gemini Deep Research, and Microsoft's Researcher can autonomously search hundreds of sources, perform multi-step analysis, and generate comprehensive reports.

A Simple Iterative Research Workflow

The workflow in tests/examples/deep-research.yaml is a simplified version focused on the core pattern of iterative research:

┌─────────────────────┐
│  Initial Search     │  → Create research plan, start report
└─────────────────────┘
          ↓
┌─────────────────────┐
│  Iterate Loop       │  → Follow plan, gather sources, refine
└─────────────────────┘  ↑
          ↓              │
    [Loop until          │
     exhausted] ─────────┘

What this workflow does: - ✅ Iterative research with loop-until-exhausted pattern - ✅ Structured file organization (plan, report, citations) - ✅ Manual citation tracking via agent instructions - ✅ Agent decides when research is complete

What it doesn't do (yet): - ❌ Multi-hop retrieval (no automatic citation following) - ❌ Dynamic re-planning (plan is updated manually by agent, not automatically) - ❌ Browser automation (relies on agent's web capabilities) - ❌ Code execution for analysis (just file organization)

This is a good starting point for building research workflows, but not a full deep research system.

Complete Workflow

Here's the full workflow with detailed annotations:

name: deep-research
description: Iteratively perform deep research on a topic

requires_workdir: true  # Must specify --workdir for outputs

params:
  query:
    range: string
    required: true
    examples:
      - "CRISPR applications in 2024"
      - "quantum computing error correction"

  workdir:
    range: string
    required: true
    examples:
      - "/tmp/research-output"

subtasks:
  # Phase 1: Initial search and planning
  initial_search:
    instructions: |
      Perform deep research on {{query}}.
      Keep all files in {{workdir}}.

      ## Start with a Plan

      Always start by writing a research plan in `PLAN.md`:
      - Key questions to investigate
      - Search strategies
      - Types of sources needed
      - How to structure the final report

      ## Create the Report

      Maintain a report in `REPORT.md`. Update this as you go along.
      Structure:
      - Executive summary
      - Main findings (organized by theme)
      - Citation graph insights
      - Open questions

      ## Organize Citations

      Create a `citations/` folder with files named like:
      `smith-2002-autophagy-paper.pdf` (if PDF available)
      `smith-2002-autophagy-fulltext.txt` (converted text, MUST be actual)
      `smith-2002-autophagy-summary.md` (YOUR summary)
      `jones-2003-regulation-abstract.md` (MUST be actual abstract)
      `jones-2003-regulation-summary.md` (YOUR summary)

      ## Track Citation Relationships

      Maintain `CITATION_GRAPH.md` with entries like:
      `* [smith-2002] cites [jones-2003] claiming "X regulates Y"`

      This helps identify:
      - Seminal papers (heavily cited)
      - Emerging ideas (recent citations)
      - Controversies (conflicting claims)

      ## Citation Rules

      - Always cite in report: "Some finding[smith-2002-autophagy]"
      - Get full text when available (never fabricate it)
      - Default to abstracts if full text unavailable
      - Use suffixes to denote type: `-paper.pdf`, `-abstract.md`, `-summary.md`
      - Your summaries go in `-summary.md`, actual content in other files

      ## Additional Notes

      You can create additional notes files as needed:
      - `METHODOLOGY_NOTES.md`
      - `KEY_FINDINGS.md`
      - `CONTRADICTIONS.md`

  # Phase 2: Iterative deepening
  iterate:
    instructions: |
      Continue researching {{query}}, following the plan in `PLAN.md`.

      For each iteration:
      1. Check `PLAN.md` for next research direction
      2. Search for relevant sources
      3. Read and summarize new papers
      4. Update `CITATION_GRAPH.md` with new relationships
      5. Update `REPORT.md` with new findings
      6. Update `PLAN.md` if new directions emerge

      Keep iterating until you've:
      - Covered all major perspectives on the topic
      - Followed key citation chains
      - Addressed the main questions in PLAN.md
      - Identified remaining open questions

      When research is exhausted, yield:
      COMPLETION_STATUS: NO_MORE_RESEARCH

    loop_until:
      status: NO_MORE_RESEARCH
      message: |
        If you think all research avenues are exhausted, then
        yield status NO_MORE_RESEARCH

How It Works

Phase 1: Initial Search

The agent: 1. Creates a research plan (PLAN.md) 2. Performs initial searches 3. Sets up the file structure 4. Begins the report 5. Downloads and organizes first batch of sources

Phase 2: Iterative Loop

The agent repeatedly: 1. Checks the plan for next steps 2. Searches for new sources 3. Reads and extracts information 4. Updates citation graph 5. Refines the report 6. Updates plan if new directions discovered

The loop continues until the agent determines research is exhausted.

File Structure Example

After running, your workspace will look like:

/tmp/research-crispr/
├── PLAN.md                          # Research strategy
├── REPORT.md                        # Main findings
├── CITATION_GRAPH.md                # Citation relationships
└── citations/
    ├── doudna-2012-crispr-paper.pdf
    ├── doudna-2012-crispr-fulltext.txt
    ├── doudna-2012-crispr-summary.md
    ├── zhang-2013-genome-abstract.md
    ├── zhang-2013-genome-summary.md
    └── cong-2013-multiplex-abstract.md

Running the Workflow

Basic Run

# Create workspace
mkdir -p /tmp/research-output

# Start agent server
cyberian server start claude --skip-permissions --dir /tmp/research-output

# Run research
cyberian run tests/examples/deep-research.yaml \
  --workdir /tmp/research-output \
  --param query="CRISPR gene editing applications 2024" \
  --param workdir="/tmp/research-output"

With Custom Timeout

Deep research can take time. Increase timeout if needed:

cyberian run tests/examples/deep-research.yaml \
  --workdir /tmp/research-output \
  --param query="quantum error correction" \
  --param workdir="/tmp/research-output" \
  --timeout 3600  # 1 hour

Monitoring Progress

Watch the report as it evolves:

# In another terminal
watch -n 5 cat /tmp/research-output/REPORT.md

Key Design Patterns

1. Structured File Organization

Why it matters: Enables the agent to track sources, avoid duplication, and cite properly.

citations/
  smith-2002-autophagy-paper.pdf      # Original source
  smith-2002-autophagy-fulltext.txt   # Extracted text (actual)
  smith-2002-autophagy-summary.md     # Agent's summary

The naming convention (author-year-topic-type) makes it easy to: - Identify sources at a glance - Avoid duplicate downloads - Know what you have (abstract vs. full text)

2. Citation Graph Tracking

Why it matters: Reveals research landscape structure.

# CITATION_GRAPH.md
* [doudna-2012] cites [jinek-2012] for "RNA-guided DNA cleavage"
* [zhang-2013] cites [doudna-2012] for "CRISPR system"
* [cong-2013] cites [doudna-2012] for "Cas9 mechanism"

This helps identify: - Seminal papers - Heavily cited (e.g., doudna-2012) - Research fronts - Recent work with few citations - Consensus vs. debate - Competing claims

3. Living Research Plan

Why it matters: Research is non-linear. Plans must adapt.

# PLAN.md

## Initial Questions
1. What are current CRISPR applications?
2. What are the limitations?

## Search Strategy
- Start with recent reviews (2023-2024)
- Follow citations to seminal papers
- Check clinical trials databases

## Emerging Directions (added during iteration)
4. NEW: Base editing vs. prime editing comparison
5. NEW: Delivery mechanisms (AAV vs. LNP)

The agent updates this as new directions emerge.

4. Loop Until Exhausted

Why it matters: Research depth varies by topic.

loop_until:
  status: NO_MORE_RESEARCH
  message: |
    If you think all research avenues are exhausted, then
    yield status NO_MORE_RESEARCH

The agent decides when to stop based on: - Coverage of main questions - Depth of citation chains followed - Diminishing returns on new searches

Comparison to 2025 Deep Research Systems

Here's how this simple workflow compares to production deep research systems:

Feature	OpenAI Deep Research	Gemini Deep Research	This Workflow
Multi-step planning	✓ (reasoning model)	✓ (continuous loop)	Partial (agent updates plan)
Multi-hop retrieval	✓ (automatic)	✓ (automatic)	✗ (manual citation following)
Browser automation	✓	✓	✗ (relies on agent capabilities)
Citation tracking	✓ (automatic)	✓ (automatic)	Partial (agent writes to files)
Iterative refinement	✓	✓	✓ (loop_until)
Structured output	✓ (reports)	✓ (multi-page)	✓ (REPORT.md)
Dynamic re-planning	✓ (adaptive)	✓ (adaptive)	✗ (static workflow)
Code execution	✓	✓	✗ (not implemented)
Human oversight	✓ (approve queries)	✓	✓ (inspect workspace)
Transparency	✓ (show sources)	✓ (auditable)	✓ (all files saved)

What this workflow provides: - Simple, understandable pattern for iterative research - Full control over file organization and structure - Transparent process (all intermediate files saved) - Easy to customize and extend - Works with any agent that can read/write files

What's missing (compared to full deep research systems): - Automatic citation graph traversal - Dynamic workflow adaptation - Specialized retrieval tools (e.g., academic databases) - Automated source quality assessment - Multi-agent coordination

Tips and Best Practices

Tip 1: Start with Focused Queries

Good:

--param query="CRISPR base editing for sickle cell disease"

Too broad:

--param query="gene therapy"

Focused queries lead to deeper, more useful reports.

Tip 2: Monitor the Citation Graph

# Check what's being cited
cat /tmp/research-output/CITATION_GRAPH.md | grep "\[" | sort | uniq -c

This reveals which papers are central to the topic.

Tip 3: Resume from Interruptions

If the workflow stops, you can resume:

# First run (interrupted)
cyberian run deep-research.yaml --workdir /tmp/research-output --param query="..."

# Resume (agent sees existing PLAN.md, REPORT.md, and continues)
cyberian run deep-research.yaml --workdir /tmp/research-output --param query="..."

The agent will see existing files and continue from where it left off.

Tip 4: Use Version Control

cd /tmp/research-output
git init
git add .
git commit -m "Initial research snapshot"

# After each major update
git commit -am "Added 5 more papers on base editing"

This lets you track how understanding evolved.

Common Issues

Issue: Agent stops too early

Cause: Agent thinks research is exhausted

Solution: Make loop condition more specific:

loop_until:
  status: NO_MORE_RESEARCH
  message: |
    Only yield NO_MORE_RESEARCH if:
    1. You've covered all questions in PLAN.md
    2. You've followed at least 3 citation levels deep
    3. You've examined at least 20 sources
    4. Recent searches return no new relevant papers

Issue: Citation files not organized

Cause: Agent not following naming convention

Solution: Be more explicit in instructions:

CRITICAL: File naming must follow this exact pattern:
  {author}-{year}-{keyword}-{type}.{ext}

Examples:
  ✓ smith-2002-autophagy-paper.pdf
  ✓ jones-2023-crispr-abstract.md
  ✗ paper1.pdf (wrong - no structure)
  ✗ Smith et al 2002.pdf (wrong - spaces, capitals)

Issue: Duplicates being downloaded

Cause: Agent not checking existing files

Solution: Add explicit check in instructions:

Before downloading any paper:
1. Check if it already exists in citations/
2. Check CITATION_GRAPH.md for the citation
3. Only download if truly new

Tag Documents Walkthrough - Validation patterns
Use Templates - Advanced Jinja2 templating
Provider Calls - External API integration
Troubleshooting - Workflow debugging