Getting Started

This tool accelerates M&A due diligence by analyzing your entire data room across 9 specialist domains (Legal, Finance, Commercial, ProductTech, Cybersecurity, HR, Tax, Regulatory, ESG) — helping your deal team find what gets buried, cross-reference it across domains, and trace every finding to an exact page and quote.

31% of M&A failures trace back to due diligence shortcomings, often because workstreams run in silos with no cross-referencing. This tool runs all nine workstreams simultaneously, cross-references findings across domains, and produces structured analysis your team can use as the foundation for IC memos, advisor reports, or negotiation checklists.

This tool does not replace professional advisors. Legal, financial, and regulatory conclusions should always be made by qualified professionals. This tool helps your team and advisors work more efficiently.

Prerequisites

Python 3.12 or later — check with python3 --version. If you need to install or upgrade, download from python.org.
An Anthropic API key — get one here. Alternatively, AWS Bedrock credentials work too.
A data room folder containing the contracts and documents to analyze.

Installation

pip install dd-agents[pdf]

This installs the tool and all required dependencies, including PDF extraction support.

Alternative: isolated install with pipx (recommended for CLI tools — avoids conflicts with other Python packages):

pipx install dd-agents[pdf]

Install from source (for development)

git clone https://github.com/zoharbabin/due-diligence-agents.git
cd due-diligence-agents
pip install -e ".[dev,pdf]"

Optional Extras

Install these for additional capabilities:

pip install dd-agents[vector]     # Semantic search across documents (ChromaDB)
pip install dd-agents[ocr]        # OCR for scanned PDFs (English)
pip install dd-agents[glm-ocr]    # Multilingual OCR (100+ languages, Apple Silicon)

Optional System Dependencies

Dependency	macOS	Linux	Purpose
`poppler`	`brew install poppler`	`apt install poppler-utils`	Fallback PDF extraction
`tesseract`	`brew install tesseract`	`apt install tesseract-ocr`	OCR for scanned PDFs

These are optional — the tool works without them but may produce lower-quality text from some scanned documents.

API Key Setup

You need an API key to run the analysis. Choose one method:

Option A — .env file (recommended, persists across terminal sessions):

cp .env.example .env

Then edit .env and set your key:

ANTHROPIC_API_KEY=sk-ant-...

Option B — Environment variable (temporary, lasts until you close the terminal):

export ANTHROPIC_API_KEY="sk-ant-..."

Option C — AWS Bedrock:

export AWS_PROFILE=default
export AWS_REGION=us-east-1

To override which AI model is used, pass --model-profile economy|standard|premium when running the pipeline (see Running the Pipeline).

Verify Installation

dd-agents version

This prints the installed version. If the command is not found, ensure the package installed correctly and your PATH includes the Python scripts directory.

Preparing Your Data Room

Organize your contracts into folders by subject (counterparty):

data_room/
  SubjectGroup_A/
    Acme_Corp/
      master_agreement.pdf
      amendment_2024.pdf
    Beta_Inc/
      license_agreement.pdf
  SubjectGroup_B/
    Gamma_LLC/
      services_contract.docx
  _reference/                    # Optional: reference docs (buyer overview, etc.)
    buyer_overview.pdf

Supported formats: PDF, Word (.docx), Excel (.xlsx), PowerPoint (.pptx), and images. Scanned PDFs are handled via OCR.

Folder structure matters: The tool uses folder names to identify which documents belong to which subject. A flat folder of files with no subfolder structure will still work — the tool groups them as a single entity — but organizing by subject produces better results.

A pre-built sample data room is included at examples/quickstart/sample_data_room/ so you can try the tool before setting up your own files.

Pre-Flight Check

Before running the full pipeline, assess your data room quality:

dd-agents assess ./data_room

This reports file type distribution, extraction readiness, and an overall completeness score. Address any critical issues before proceeding.

First Run

The typical workflow is three steps: generate a config, run the pipeline, review the report.

1. Generate a Deal Configuration

The fastest path is auto-config, which uses AI to scan your data room and produce a complete configuration:

dd-agents auto-config "Acme Corp" "Target Inc" --data-room ./data_room

This produces a deal-config.json with buyer/target details, company name variants, focus areas, and data room mapping. See Deal Configuration for details.

To preview the config without writing it:

dd-agents auto-config "Acme Corp" "Target Inc" --data-room ./data_room --dry-run

Alternatively, generate a config interactively without any API calls:

dd-agents init --data-room ./data_room

2. Run the Pipeline

dd-agents run deal-config.json

The pipeline extracts text, matches company names, runs AI analysis across all specialist domains, validates quality, and generates the report.

To preview what will happen without making API calls:

dd-agents run deal-config.json --dry-run

For a quick red-flag triage instead of full analysis:

dd-agents run deal-config.json --quick-scan --model-profile economy

See Running the Pipeline for all options including resume, model selection, and quality gates.

3. Review the Report

After the pipeline completes, find the outputs in _dd/forensic-dd/runs/latest/report/:

dd_report.html -- Interactive HTML report with cross-domain findings, severity filtering, and drill-down to exact clauses
dd_report.xlsx -- 14-sheet Excel report for detailed analysis and downstream work

Open dd_report.html in a browser. See Reading the Report for a walkthrough of each section.

Use these reports alongside your advisory process. The structured findings, citations, and cross-references serve as the foundation for your team's own deliverables — board presentations, advisor memos, negotiation checklists, or integration plans.

Post-Run Tools

Interactive Chat

Explore findings in a multi-turn conversation with document tools and persistent memory:

dd-agents chat --report _dd/forensic-dd/runs/latest

Ask follow-up questions, drill into source documents, verify citations — insights are saved automatically and recalled in future sessions.

Contract Search

Search contracts with custom questions without running the full pipeline:

dd-agents search prompts.json --data-room ./data_room

Natural Language Query

Ask a single question about findings (for multi-turn conversations, use chat instead):

dd-agents query --report _dd/forensic-dd/runs/latest -q "How many high-severity findings?"

PDF Export

Export the HTML report to a print-ready PDF:

dd-agents export-pdf _dd/forensic-dd/runs/latest/report/dd_report.html

Portfolio Management

Track multiple due diligence projects and compare risk profiles across deals:

dd-agents portfolio add "Alpha Acquisition" --data-room ./alpha_data_room
dd-agents portfolio list
dd-agents portfolio compare

Report Templates

Apply templates for different audiences (Board Summary, Legal Deep Dive, etc.):

dd-agents templates list
dd-agents templates show board_summary

See the CLI Reference for full documentation of all commands.

Docker

Build and run in a container:

docker build -t dd-agents .
docker run -e ANTHROPIC_API_KEY="sk-ant-..." \
  -v ./data_room:/workspace/data_room \
  -v ./deal-config.json:/workspace/deal-config.json \
  dd-agents run deal-config.json

Next Steps

Deal Configuration -- Config file structure and generation
Running the Pipeline -- Execution modes and options
Reading the Report -- Navigating the HTML and Excel output
CLI Reference -- Complete command reference