P
Pagentic

Documentation

Pagentic exposes one customer-facing API surface (/v1/*) and a self-serve portal for managing keys, users, and viewing usage.

New here?

Read the platform overview — every feature, on one page.

API endpoint

All /v1/* calls go to:

https://pagentic-api.fly.dev

Hit the API host directly — don't route through pagentic.io, which redirects to www and will strip POST bodies. A custom api.pagentic.io CNAME is on the roadmap.

Authentication

Every call requires a Bearer token in the Authorization header:

Authorization: Bearer dv_live_abc12345_<random>

Mint keys from the portal. Two modes:

Each key carries a scope set: extractions:write, extractions:read, agents:read, usage:read. Mint with the narrowest scope you need — the portal lets you tick which to grant. Revoking a key takes effect immediately; old key strings are SHA-256 hashed at rest and never recoverable.

Quick start

  1. Get a key from the portal (or your operator).
  2. POST a PDF to /v1/extractions with agent=<agent_id>.
  3. Either set webhook_url for an HMAC-signed callback, or poll GET /v1/extractions/{id}.
  4. Fetch the structured envelope from GET /v1/extractions/{id}/result when status is succeeded.
# Submit (async — returns 202 immediately)
curl -H "Authorization: Bearer dv_live_..." \
     -F "agent=ltl_freight_invoice" \
     -F "document=@invoice.pdf" \
     -F "webhook_url=https://my-app/hook" \
     https://pagentic-api.fly.dev/v1/extractions

# Poll
curl -H "Authorization: Bearer dv_live_..." \
     https://pagentic-api.fly.dev/v1/extractions/ext_...

# Fetch structured result (only when status="succeeded")
curl -H "Authorization: Bearer dv_live_..." \
     https://pagentic-api.fly.dev/v1/extractions/ext_.../result

# List your agents (find the agent_id to use above)
curl -H "Authorization: Bearer dv_live_..." \
     https://pagentic-api.fly.dev/v1/agents

Agent IDs & version pinning

Agent IDs are slugs you choose at publish time. They must match ^[a-z][a-z0-9_]{2,40}$ — 3-41 chars, lowercase letters/digits/underscores, starting with a letter. Examples: ltl_freight_invoice, medical_eob, policy_appendix_v2.

The agent form parameter accepts three forms:

Retired versions still resolve when explicitly pinned (so old integrations keep working) but are excluded from @latest. Archived agents return 403 agent_not_enabled for every request — unarchive from the portal to bring them back online.

Status lifecycle

An extraction moves through:

queued → processing → succeeded
                   ↘ pending_approval → succeeded
                                      ↘ rejected
                   ↘ failed
                   ↘ cancelled

Approval queue (human-in-the-loop)

Each agent has a requires_approval setting, toggled from the portal's agent edit modal. When on, every completed extraction enters pending_approval and the webhook is held until a reviewer approves the result in the portal queue. Use this for sensitive workflows (legal, healthcare, large-dollar approvals) or while building trust with a new agent.

From the API caller's perspective:

# Submit (same as always)
curl -H "Authorization: Bearer dv_live_..." \
     -F "agent=medical_eob" \
     -F "document=@eob.pdf" \
     https://pagentic-api.fly.dev/v1/extractions

# Poll — note the new status while the reviewer is looking at it
curl -H "Authorization: Bearer dv_live_..." \
     https://pagentic-api.fly.dev/v1/extractions/ext_...
# → { "status": "pending_approval", ... }

# After approval (in the portal), the same poll returns:
# → { "status": "succeeded", "approved_at": "2026-...", "approved_by": "usr_..." }

Recommended client pattern. Treat pending_approval like processing — keep polling, just on a slower cadence (every 30-60s is plenty; humans aren't milliseconds-fast). When status flips to succeeded, the result is fetchable. When it flips to rejected, your downstream system should treat it the same as a manual cancellation and skip processing.

Sync vs async

Two modes on submission, controlled by the mode form field:

Result envelope

The shape of resultis dictated by the agent's output schema, but every agent wraps each leaf in a field-wrapper for provenance:

{
  "_meta": {
    "notes": [
      "Page 2 line item 3: subtotal $1,213.50 vs. sum of extended prices $1,213.45 (off by $0.05; likely rounding).",
      "OCR appears to read 'afer' on page 1 — preserved verbatim."
    ]
  },
  "invoice_number": {
    "value": "INV-238417",
    "verbatim": "Invoice No. INV-238417",
    "pages": [1],
    "confidence": "high",
    "notes": null
  },
  "bill_to_name": {
    "value": "Acme Logistics, Inc.",
    "verbatim": "Bill To: Acme Logistics, Inc.",
    "pages": [1],
    "confidence": "high",
    "notes": null
  },
  "line_items": [
    {
      "description": { "value": "Freight, LTL, ATL→DFW", "verbatim": "Freight LTL ATL→DFW", "pages": [2], "confidence": "high", "notes": null },
      "extended":    { "value": 412.50, "verbatim": "412.50", "pages": [2], "confidence": "high", "notes": null }
    }
  ]
}

Every leaf carries value, verbatim, pages, confidence, and notes. That's how you audit a value back to the original PDF — the verbatim string is the literal text from the source, and pages is the 1-based PDF page indices it came from.

_meta.notesis the data-quality channel — spelling/OCR errors, math discrepancies, missing fields, conflicting values, ambiguous interpretations. Surface this to operators; it's what catches integration errors that would otherwise hit your downstream system silently.

Job summary shape

POST /v1/extractions (async) and GET /v1/extractions/{id} return:

{
  "id": "ext_01JKZX...",
  "status": "queued",
  "agent": "ltl_freight_invoice",
  "agent_version": 5,
  "created_at": "2026-04-30T18:42:01.123Z",
  "metadata": { "your": "passthrough" },
  "estimated_completion_at": "2026-04-30T18:43:30.000Z",
  "links": {
    "self":   "/v1/extractions/ext_01JKZX...",
    "result": "/v1/extractions/ext_01JKZX.../result"
  }
}

On succeeded, completed_at is set. On failed, an error object is included alongside completed_at.

Webhooks & signature verification

There are two ways to point an extraction at your endpoint. Either pass webhook_url on the submission, or set a default webhook URL on the agent itself (Portal → Agents → Edit) so every extraction for that agent posts there automatically.

Precedence (most-specific wins):

  1. disable_webhook=true on the call — no webhook fires for this extraction, period.
  2. webhook_url on the call — that URL is used.
  3. Agent default_webhook_url (set in the UI) — fallback when no per-call URL is given.
  4. None of the above — no webhook fires; you poll GET /v1/extractions/{id}.

Opting out for one call.If your agent has a default URL but you need to suppress delivery for a single extraction (e.g. backfilling old documents you don't want to re-trigger downstream workflows for), send disable_webhook=true:

# Use the agent's configured webhook
curl -H "Authorization: Bearer dv_live_..." \
     -F "agent=ltl_freight_invoice" \
     -F "document=@invoice.pdf" \
     https://pagentic-api.fly.dev/v1/extractions

# Override for this one call
curl -H "Authorization: Bearer dv_live_..." \
     -F "agent=ltl_freight_invoice" \
     -F "document=@invoice.pdf" \
     -F "webhook_url=https://staging.example.com/hook" \
     https://pagentic-api.fly.dev/v1/extractions

# Suppress webhook entirely for this call
curl -H "Authorization: Bearer dv_live_..." \
     -F "agent=ltl_freight_invoice" \
     -F "document=@invoice.pdf" \
     -F "disable_webhook=true" \
     https://pagentic-api.fly.dev/v1/extractions

Combining disable_webhook=true with webhook_url in the same call returns 400 contradictory_webhook — pick one.

Once the job lands in a terminal state we POST the body (the same payload the corresponding GET would return) with these headers:

X-Pagentic-Event:     extraction.succeeded   (or extraction.failed)
X-Pagentic-Signature: t=1714502531,v1=<hex-hmac>
X-Pagentic-Delivery:  whd_12345

Signature is HMAC-SHA256 of {ts}.{raw_body}using your tenant's webhook secret. Always verify before trusting the payload — Python and Node samples:

# Python (Flask example)
import hmac, hashlib, time
SECRET = b"<your tenant webhook secret>"

def verify(req):
    sig = req.headers["X-Pagentic-Signature"]   # "t=...,v1=..."
    parts = dict(p.split("=", 1) for p in sig.split(","))
    ts, v1 = parts["t"], parts["v1"]
    if abs(time.time() - int(ts)) > 300:
        return False  # reject replays older than 5 minutes
    mac = hmac.new(SECRET, f"{ts}.".encode() + req.get_data(), hashlib.sha256)
    return hmac.compare_digest(mac.hexdigest(), v1)
// Node (Express example)
import crypto from "node:crypto";
const SECRET = "<your tenant webhook secret>";

export function verify(req, raw) {  // raw = the unparsed request body
  const sig = req.headers["x-pagentic-signature"];
  const { t, v1 } = Object.fromEntries(sig.split(",").map(s => s.split("=")));
  if (Math.abs(Date.now() / 1000 - Number(t)) > 300) return false;
  const mac = crypto.createHmac("sha256", SECRET)
                    .update(`${t}.`).update(raw).digest("hex");
  return crypto.timingSafeEqual(Buffer.from(mac), Buffer.from(v1));
}

Retry schedule after a non-2xx response or transport error:

attempt 1 fails → wait 1 min
attempt 2 fails → wait 5 min
attempt 3 fails → wait 30 min
attempt 4 fails → wait 2 h
attempt 5 fails → wait 12 h
attempt 6 fails → dead-letter (visible in the portal; no further retries)

Respond with any 2xx within 10 seconds to count as delivered. Long-running handlers should ack first, then enqueue work locally. You can replay dead-lettered deliveries from the portal's webhooks panel.

Idempotency

Pass an X-Idempotency-Key header on POST /v1/extractions. If we've seen the same key from your tenant before, we replay the original response — same status code, same body — with an additional Idempotent-Replayed: true header.

curl -H "Authorization: Bearer dv_live_..." \
     -H "X-Idempotency-Key: f8b1c0e2-..." \
     -F "agent=ltl_freight_invoice" \
     -F "document=@invoice.pdf" \
     https://pagentic-api.fly.dev/v1/extractions

Use a UUID per logical operation. Network glitches stop costing you double extractions. Keys are scoped per tenant and retained per your retention window (default 30 days).

Error envelope & codes

All /v1/* errors return:

{
  "error": {
    "type":    "permission_error",
    "code":    "agent_not_enabled",
    "message": "Agent 'ltl_freight_invoice' is not enabled for this tenant.",
    "param":   "agent",
    "request_id": "req_01JKZX..."
  }
}

Common codes:

HTTPcodeWhen
400missing_documentNeither document nor document_url was provided.
400invalid_modemode must be 'sync' or 'async'.
400invalid_agent_versionPinned an integer version that doesn't exist.
400agent_version_retiredPinned a retired version explicitly.
400invalid_metadatametadata field isn't valid JSON or > 4 KB.
400contradictory_webhookdisable_webhook=true was sent together with webhook_url. Pick one.
401invalid_api_keyBearer header missing or unrecognized.
403agent_not_enabledThe agent isn't entitled for your tenant (or is archived).
403insufficient_scopeYour key doesn't carry the scope this endpoint requires.
404agent_not_foundUnknown agent_id.
404extraction_not_foundUnknown extraction id, or it belongs to another tenant.
409extraction_not_readyGET /result called before status='succeeded' (e.g. still processing, pending_approval, rejected, or failed).
409extraction_not_cancellablePOST /cancel called after the job already finished.
413max_pages_exceededPDF has more pages than the agent's max_pages.
413sync_page_limit_exceededSync mode caps at the tenant's sync_page_limit.
429rate_limit_exceededToo many requests for this scope; retry after the Reset header.
504sync_timeoutSync extraction exceeded 60s; the job continues async.

Rate limits

Per-key, per-minute, by scope:

extractions:write   100 / min
extractions:read   1000 / min
agents:read        1000 / min
usage:read          100 / min

Every response carries:

RateLimit-Limit:     100
RateLimit-Remaining: 87
RateLimit-Reset:     42        # seconds until the window rolls over

Over-limit calls return 429 rate_limit_exceeded. If you need higher limits, mention it when you reach out — we set them per tenant.

Billing & usage

Each successful extraction emits a usage event with pages, token counts, and the per-page price applied. Roll-ups are visible at portal/usage. Per-page rates are quoted up front per agent type — token usage is reported on the event but not directly billed.

Need more?

Email hello@pagentic.comwith the agent type you're working on. We'll get you a sandbox key and walk through the schema for your specific document type.