Document Extraction API
Turn documents into data you can trust.
Extract, validate, and structure data from any document. AI flexibility wrapped in deterministic guardrails — so your systems never receive bad data.
The production gap
Pure AI gets you 90%.
Production requires 100%.
LLMs made document extraction easy to start. They didn't make it reliable enough to ship.
The hybrid advantage
AI + Rules = Reliable extraction
Most tools force a choice: AI that's flexible but unpredictable, or templates that are precise but brittle. Sensible combines both.
The result: AI handles variation across 100+ document formats. Rules guarantee the consistency your production systems require.
Platform
Built for production, not prototypes
Every feature exists because pure AI fails without it.
Validation
Schema enforcement on every extraction
Define your output schema once. Sensible validates every extraction against it. Type mismatches, missing required fields, and out-of-range values fail fast — not silently in your database.
- Data type validation (string, number, date, currency)
- Required field enforcement
- Confidence scoring per field
- Cross-field reconciliation logic
Developer experience
Extraction logic as code
SenseML is a declarative configuration language that makes extraction logic explicit. Define fields, validation rules, and fallback behaviors in YAML. Version control it, test it, deploy through CI/CD.
- SenseML configurations in Git
- Regression testing against document samples
- CI/CD deployment integration
- Peer review changes like any other code
API
A few lines to production
SDKs for JavaScript and Python. RESTful API. Every response includes confidence scores, validation results, and source provenance.
import Sensible from "sensible-api"; const sensible = new Sensible("YOUR_API_KEY"); const result = await sensible.extract({ url: "https://example.com/statement.pdf", type: "bank_statements", config: "chase_v2" }); // Schema-validated, with provenance console.log(result.parsed_document);
from sensible import SensibleSDK client = SensibleSDK(api_key="YOUR_API_KEY") result = client.extract( url="https://example.com/statement.pdf", document_type="bank_statements", configuration="chase_v2" ) # Every field includes source + confidence print(result.parsed_document)
curl -X POST https://api.sensible.so/v0/extract \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "document_url": "https://...", "document_type": "bank_statements", "configuration": "chase_v2" }'
Customer story
From prototype to production
"We replaced our internal GPT build with Sensible's hybrid extraction. Manual loss run review dropped from 15% to under 1%. The validation layer catches errors we would have missed — and we can trace every extraction back to the source document."
Ready to extract with confidence?
Free tier with 100 documents per month. No credit card required.