Developer
Documentation
Integrate DocExtractor into your workflow. Upload documents, trigger scans, and export structured data — all via simple HTTP endpoints.
Quick Start
Get up and running in minutes with session-based authentication.
Sign In
Create an account or log in to get your session credentials.
Create a Collection
Organize your documents into collections for batch processing.
Upload & Extract
Upload files, trigger a scan, and export your structured data.
Authentication
All API endpoints require an authenticated session. Log in via /auth/ to obtain a session cookie. Include this cookie in all subsequent requests.
API Endpoints
Complete reference for all available endpoints.
File Operations
/collections/upload/{collection_id}/
Upload files to a collection
/collections/file/delete/{file_id}/
Delete a file
/collections/file/status/{file_id}/
Get file processing status and precision
/collections/file/{file_id}/version/{version_id}/json/
Get extracted data for a file version
Scanning
/collections/file/scan/{file_id}/
Trigger AI scan on a single file
/collections/collection/scan/{collection_id}/
Trigger AI scan on all files in a collection
/collections/collection/progress/{collection_id}/
Poll scan progress and status
/collections/collection/stop/{collection_id}/
Cancel a running scan
Export
/collections/files/{file_id}/{version_id}/export/excel/
Download file data as Excel
/collections/files/{file_id}/{version_id}/export/csv/
Download file data as CSV
/collections/files/{file_id}/{version_id}/export/json/
Download file data as JSON
/collections/collection/{collection_id}/export/excel/
Download entire collection as Excel
/collections/collection/{collection_id}/export/csv/
Download entire collection as CSV
/collections/collection/{collection_id}/export/json/
Download entire collection as JSON
Notifications
/notifications/mark-read/{notification_id}/
Mark a notification as read
/notifications/mark-unread/{notification_id}/
Mark a notification as unread
/notifications/delete/{notification_id}/
Delete a notification
/notifications/read-all/
Mark all notifications as read
Response Examples
All endpoints return JSON responses with consistent structure.
// POST /collections/file/scan/{file_id}/ { "status": "processing", "file_id": 42, "message": "Scan enqueued" }
// GET /collections/file/status/{file_id}/ { "status": "scanned", "precision": 98.5 }
// GET /collections/collection/progress/{id}/ { "scan_progress": 75, "status": "scanning" }
// Any endpoint on failure { "error": "File not found", "status": 404 }
Export Formats
Choose the format that fits your workflow.
Excel
Spreadsheet format with multiple sheets. Ready for analysis in Excel or Google Sheets.
CSV
Comma-separated values in a ZIP archive. Universal format for data pipelines and databases.
JSON
Structured JSON output. Ideal for API integrations and programmatic access.
Ready to Start Building?
Create your account and start integrating DocExtractor into your workflow today.