API Reference

Developer
Documentation

Integrate DocExtractor into your workflow. Upload documents, trigger scans, and export structured data — all via simple HTTP endpoints.

Quick Start

Get up and running in minutes with session-based authentication.

1

Sign In

Create an account or log in to get your session credentials.

2

Create a Collection

Organize your documents into collections for batch processing.

3

Upload & Extract

Upload files, trigger a scan, and export your structured data.

Authentication

All API endpoints require an authenticated session. Log in via /auth/ to obtain a session cookie. Include this cookie in all subsequent requests.

API Endpoints

Complete reference for all available endpoints.

File Operations

POST /collections/upload/{collection_id}/ Upload files to a collection
POST /collections/file/delete/{file_id}/ Delete a file
GET /collections/file/status/{file_id}/ Get file processing status and precision
GET /collections/file/{file_id}/version/{version_id}/json/ Get extracted data for a file version

Scanning

POST /collections/file/scan/{file_id}/ Trigger AI scan on a single file
POST /collections/collection/scan/{collection_id}/ Trigger AI scan on all files in a collection
GET /collections/collection/progress/{collection_id}/ Poll scan progress and status
POST /collections/collection/stop/{collection_id}/ Cancel a running scan

Export

GET /collections/files/{file_id}/{version_id}/export/excel/ Download file data as Excel
GET /collections/files/{file_id}/{version_id}/export/csv/ Download file data as CSV
GET /collections/files/{file_id}/{version_id}/export/json/ Download file data as JSON
GET /collections/collection/{collection_id}/export/excel/ Download entire collection as Excel
GET /collections/collection/{collection_id}/export/csv/ Download entire collection as CSV
GET /collections/collection/{collection_id}/export/json/ Download entire collection as JSON

Notifications

POST /notifications/mark-read/{notification_id}/ Mark a notification as read
POST /notifications/mark-unread/{notification_id}/ Mark a notification as unread
POST /notifications/delete/{notification_id}/ Delete a notification
POST /notifications/read-all/ Mark all notifications as read

Response Examples

All endpoints return JSON responses with consistent structure.

Success Response
// POST /collections/file/scan/{file_id}/
{
  "status": "processing",
  "file_id": 42,
  "message": "Scan enqueued"
}
File Status
// GET /collections/file/status/{file_id}/
{
  "status": "scanned",
  "precision": 98.5
}
Scan Progress
// GET /collections/collection/progress/{id}/
{
  "scan_progress": 75,
  "status": "scanning"
}
Error Response
// Any endpoint on failure
{
  "error": "File not found",
  "status": 404
}

Export Formats

Choose the format that fits your workflow.

.xlsx

Excel

Spreadsheet format with multiple sheets. Ready for analysis in Excel or Google Sheets.

.csv

CSV

Comma-separated values in a ZIP archive. Universal format for data pipelines and databases.

.json

JSON

Structured JSON output. Ideal for API integrations and programmatic access.

Ready to Start Building?

Create your account and start integrating DocExtractor into your workflow today.