Document Extraction API

Turn documents into data you can trust.

Extract, validate, and structure data from any document. AI flexibility wrapped in deterministic guardrails — so your systems never receive bad data.

SOC 2 Type II HIPAA Compliant 100+ document types
JPMorgan Chase Checking Account Statement · Jan 2026 Account Holder Acme Corp LLC Account Number ****4521 Statement Period Jan 1 – 31, 2026 Ending Balance $42,891.50 TRANSACTIONS DATE DESCRIPTION AMOUNT 01/05 ACH - Payroll -$12,450.00 01/12 Wire - Client PMT +$28,500.00 01/18 ACH - AWS Infra -$3,241.72 01/22 ACH - Gusto Payroll -$12,450.00 01/25 Check #1084 -$1,500.00 Page 1 of 3 SENSIBLE STRUCTURED OUTPUT Valid ✓ { "account_holder": { "value": "Acme Corp LLC", "confidence": 0.99 }, "account_number": "****4521", "period": { "start": "2026-01-01", "end": "2026-01-31" }, "ending_balance": { "value": 42891.50, "type": "currency", "confidence": 0.98 } } 11 fields · 6 validations pass · 230ms bank_statements / chase_v2 · Confidence: 0.98
Trusted by 100+ companies including
VouchSpruceMarbleInspectifyClaimerLedgebrookAngelList

The production gap

Pure AI gets you 90%.
Production requires 100%.

LLMs made document extraction easy to start. They didn't make it reliable enough to ship.

Pure AI LLM-only extraction ACCURACY ~90% account_holder "Acme Corp LLC" Correct ending_balance "42,891" ← string, not number Type mismatch breaks downstream calculations routing_number "021000021" ← hallucinated Value invented to fill expected schema. Not in source. ~ period_start "January 2026" ← inconsistent format Output format varies between runs. Breaks parsing. 2 errors + 1 inconsistency = manual review required With Sensible AI + deterministic rules ACCURACY 99.7% account_holder "Acme Corp LLC" conf: 0.99 Validated ending_balance 42891.50 type: currency, conf: 0.98 Schema enforcement: correct type, validated range ! routing_number flagged for review conf: 0.72 Low confidence → routed to human review, not hallucinated period_start "2026-01-01" ISO 8601 enforced Schema enforcement: consistent format every run 4/4 fields validated · 1 flagged for review · 0 errors VS

The hybrid advantage

AI + Rules = Reliable extraction

Most tools force a choice: AI that's flexible but unpredictable, or templates that are precise but brittle. Sensible combines both.

Document Input PDF · Image · Spreadsheet AI Extraction Semantic understanding Layout Rules Structural precision + MERGE Schema Validation Type checks · Required fields · Confidence Verified JSON Schema-compliant · Auditable

The result: AI handles variation across 100+ document formats. Rules guarantee the consistency your production systems require.

Platform

Built for production, not prototypes

Every feature exists because pure AI fails without it.

Validation

Schema enforcement on every extraction

Define your output schema once. Sensible validates every extraction against it. Type mismatches, missing required fields, and out-of-range values fail fast — not silently in your database.

  • Data type validation (string, number, date, currency)
  • Required field enforcement
  • Confidence scoring per field
  • Cross-field reconciliation logic
SCHEMA VALIDATION 5/6 pass · 1 review account_holder string · required 0.99 ending_balance currency · required 0.98 period_start date · required · ISO 8601 0.99 transactions[] array · min: 1 0.97 period_end date · required · ISO 8601 0.99 ! routing_number string · optional · low confidence 0.72

Developer experience

Extraction logic as code

SenseML is a declarative configuration language that makes extraction logic explicit. Define fields, validation rules, and fallback behaviors in YAML. Version control it, test it, deploy through CI/CD.

  • SenseML configurations in Git
  • Regression testing against document samples
  • CI/CD deployment integration
  • Peer review changes like any other code
chase_v2.yaml 1234 5678 9101112 131415 ## Bank statement extraction fields: - id: account_holder anchor: "Account Holder" method: id: row - id: ending_balance anchor: "Ending Balance" method: id: row type: currency computed_fields: - id: balance_check value: "ending_balance > 0" severity: error

API

A few lines to production

SDKs for JavaScript and Python. RESTful API. Every response includes confidence scores, validation results, and source provenance.

Confidence scores on every field
Source coordinates for audit trails
Schema validation before response
Webhook support for async processing
Read the docs
import Sensible from "sensible-api";

const sensible = new Sensible("YOUR_API_KEY");

const result = await sensible.extract({
  url: "https://example.com/statement.pdf",
  type: "bank_statements",
  config: "chase_v2"
});

// Schema-validated, with provenance
console.log(result.parsed_document);
from sensible import SensibleSDK

client = SensibleSDK(api_key="YOUR_API_KEY")

result = client.extract(
    url="https://example.com/statement.pdf",
    document_type="bank_statements",
    configuration="chase_v2"
)

# Every field includes source + confidence
print(result.parsed_document)
curl -X POST https://api.sensible.so/v0/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document_url": "https://...",
    "document_type": "bank_statements",
    "configuration": "chase_v2"
  }'

Customer story

From prototype to production

"We replaced our internal GPT build with Sensible's hybrid extraction. Manual loss run review dropped from 15% to under 1%. The validation layer catches errors we would have missed — and we can trace every extraction back to the source document."
L
Ledgebrook Insurance
Engineering Team
73%
Less manual review
99%+
Extraction accuracy
<2s
Avg extraction time

Ready to extract with confidence?

Free tier with 100 documents per month. No credit card required.

SOC 2 Type IIHIPAA CompliantNo credit card