Document Extraction API

Turn documents into data you can trust.

Extract, validate, and structure data from any document. AI flexibility wrapped in deterministic guardrails — so your systems never receive bad data.

Start extracting free Book a demo

SOC 2 Type II HIPAA Compliant 100+ document types

Trusted by 100+ companies including

VouchSpruceMarbleInspectifyClaimerLedgebrookAngelList

The production gap

Pure AI gets you 90%.
Production requires 100%.

LLMs made document extraction easy to start. They didn't make it reliable enough to ship.

The hybrid advantage

AI + Rules = Reliable extraction

Most tools force a choice: AI that's flexible but unpredictable, or templates that are precise but brittle. Sensible combines both.

The result: AI handles variation across 100+ document formats. Rules guarantee the consistency your production systems require.

Platform

Built for production, not prototypes

Every feature exists because pure AI fails without it.

Validation

Schema enforcement on every extraction

Define your output schema once. Sensible validates every extraction against it. Type mismatches, missing required fields, and out-of-range values fail fast — not silently in your database.

Data type validation (string, number, date, currency)
Required field enforcement
Confidence scoring per field
Cross-field reconciliation logic

Developer experience

Extraction logic as code

SenseML is a declarative configuration language that makes extraction logic explicit. Define fields, validation rules, and fallback behaviors in YAML. Version control it, test it, deploy through CI/CD.

SenseML configurations in Git
Regression testing against document samples
CI/CD deployment integration
Peer review changes like any other code

API

A few lines to production

SDKs for JavaScript and Python. RESTful API. Every response includes confidence scores, validation results, and source provenance.

Confidence scores on every field

Source coordinates for audit trails

Schema validation before response

Webhook support for async processing

Read the docs

import Sensible from "sensible-api";

const sensible = new Sensible("YOUR_API_KEY");

const result = await sensible.extract({
  url: "https://example.com/statement.pdf",
  type: "bank_statements",
  config: "chase_v2"
});

// Schema-validated, with provenance
console.log(result.parsed_document);

from sensible import SensibleSDK

client = SensibleSDK(api_key="YOUR_API_KEY")

result = client.extract(
    url="https://example.com/statement.pdf",
    document_type="bank_statements",
    configuration="chase_v2"
)

# Every field includes source + confidence
print(result.parsed_document)

curl -X POST https://api.sensible.so/v0/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document_url": "https://...",
    "document_type": "bank_statements",
    "configuration": "chase_v2"
  }'

Customer story

From prototype to production

"We replaced our internal GPT build with Sensible's hybrid extraction. Manual loss run review dropped from 15% to under 1%. The validation layer catches errors we would have missed — and we can trace every extraction back to the source document."

Ledgebrook Insurance

Engineering Team

73%

Less manual review

99%+

Extraction accuracy

<2s

Avg extraction time

Ready to extract with confidence?

Free tier with 100 documents per month. No credit card required.

Start extracting free Book a technical demo

SOC 2 Type IIHIPAA CompliantNo credit card