CrawlX API
A REST API and webhooks to trigger crawls, pull issues, and ship fixes from your own pipeline.
Authentication
Every request is authenticated with a bearer token. API keys are prefixed by environment — cx_live_… for production and cx_test_… for sandboxed testing — and are sent in the Authorization header. Generate a key from your project settings; treat it like a password and never expose it in client-side code.
curl https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e \
-H "Authorization: Bearer cx_live_8f3c2a9d4b6e1f0a7c5d2e9b4a8f1c6d"Keys carry one of two scopes, so you can hand out read-only access without granting write:
read— read-only access: poll crawl status, list issues and pages, and fetch or export reports.crawl-trigger— a superset ofreadthat can also start crawls.
A missing or invalid key returns 401. Requests for a project or resource your key cannot access return 404 rather than 403 — we never confirm the existence of resources outside your access.
Base URL
All endpoints are versioned and live under a single base URL. Paths below are relative to it.
https://api.crawlx.ai/v1Endpoints
Trigger a crawl
POST /projects/{id}/crawls — start a crawl for a project. Requires the crawl-trigger scope. Triggering is asynchronous: a successful call returns 202 Accepted with a crawl_id you then poll. Pass an Idempotency-Key header to make retries safe — replaying the same key returns the original crawl instead of starting a new one.
curl -X POST https://api.crawlx.ai/v1/projects/p_1a2b3c4d/crawls \
-H "Authorization: Bearer cx_live_…" \
-H "Content-Type: application/json" \
-H "Idempotency-Key: 9d8c7b6a-5e4f-3a2b-1c0d-9e8f7a6b5c4d"
# 202 Accepted
{
"crawl_id": "c_2f7a9e1b4d6c8a0e",
"status": "queued"
}Get crawl status
GET /crawls/{id} — poll a crawl. Returns its status, progress, and summary counts. Requires the read scope.
curl https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e \
-H "Authorization: Bearer cx_live_…"
# 200 OK
{
"crawl": {
"id": "c_2f7a9e1b4d6c8a0e",
"status": "running",
"progress": 0.62,
"pages_crawled": 3104,
"issues_found": 87
}
}List issues
GET /crawls/{id}/issues — list the issues found in a crawl, ordered by traffic impact. Supports limit and cursor query params for pagination. Requires the read scope.
curl "https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e/issues?limit=2" \
-H "Authorization: Bearer cx_live_…"
# 200 OK
{
"issues": [
{
"id": "i_7c1d9a3e",
"type": "broken_internal_link",
"severity": "high",
"impact_score": 92,
"pages_affected": 211
},
{
"id": "i_4b8f2c6a",
"type": "bad_canonical",
"severity": "medium",
"impact_score": 64,
"pages_affected": 538
}
],
"next_cursor": "eyJvIjoyfQ"
}List pages
GET /crawls/{id}/pages — list the pages crawled, with their status code and timing. Paginates the same way as issues. Requires the read scope.
curl "https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e/pages?limit=2" \
-H "Authorization: Bearer cx_live_…"
# 200 OK
{
"pages": [
{
"url": "https://example.com/products/trail-runner-gtx",
"status_code": 200,
"response_ms": 240,
"issues": 0
},
{
"url": "https://example.com/products/sku-2019-retired",
"status_code": 404,
"response_ms": 1100,
"issues": 1
}
],
"next_cursor": "eyJvIjoyfQ"
}Export report
GET /crawls/{id}/report — fetch or export a report. Use format=json (default) or format=csv, and type=issues or type=pages to select the dataset. Requires the read scope.
curl "https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e/report?format=csv&type=issues" \
-H "Authorization: Bearer cx_live_…" \
-o issues.csvWebhooks
Register a webhook for a project and CrawlX will POST a signed JSON envelope when one of these events fires:
crawl.completed— a crawl finished successfully;datacarries the crawl summary.crawl.failed— a crawl ended in error;datacarries the failure reason.issue.found— a new issue was detected during a crawl.
Every delivery uses the same envelope. The event_id is stable across retries — use it to dedupe.
POST https://your-app.example.com/webhooks/crawlx
X-CrawlX-Event: issue.found
X-CrawlX-Event-Id: ev_5b3a1c9d7e2f4a6b
X-CrawlX-Signature: t=1779743400,v1=4f1d…a9c2
{
"event": "issue.found",
"event_id": "ev_5b3a1c9d7e2f4a6b",
"delivered_at": "2026-05-26T12:34:56.000Z",
"project_id": "p_1a2b3c4d",
"data": {
"crawl_id": "c_2f7a9e1b4d6c8a0e",
"issue_id": "i_7c1d9a3e",
"type": "broken_internal_link",
"severity": "high"
}
}Verifying the signature
Each delivery is signed with the webhook's secret. The X-CrawlX-Signature header has the form t=<unix>,v1=<hmac>. The signature is an HMAC-SHA256 over the string `${t}.${body}` — the timestamp, a literal dot, then the exact raw request body. Verify on the raw body before parsing JSON, compare in constant time, and reject any request whose timestamp is more than 300 seconds from now to defeat replays.
import { createHmac, timingSafeEqual } from "node:crypto";
const TOLERANCE_SECONDS = 300;
// body must be the RAW request body string (not re-serialized JSON).
function verifyWebhook(secret, body, header, now = Math.floor(Date.now() / 1000)) {
const parts = Object.fromEntries(
header.split(",").map((kv) => kv.split("=").map((s) => s.trim())),
);
const t = Number(parts.t);
const sig = parts.v1;
if (!t || !sig) return false;
// Replay guard: reject stale timestamps (±5 minutes).
if (Math.abs(now - t) > TOLERANCE_SECONDS) return false;
// Recompute the HMAC and compare in constant time.
const expected = createHmac("sha256", secret)
.update(t + "." + body)
.digest("hex");
const a = Buffer.from(sig);
const b = Buffer.from(expected);
return a.length === b.length && timingSafeEqual(a, b);
}Rate limits
Requests are rate-limited per API key. When you exceed the limit the API responds with 429 Too Many Requests and a Retry-After header (in seconds) telling you when to retry. Back off and respect it.
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json
{ "error": "rate_limited", "retry_after": 30 }Build on CrawlX.
Generate an API key in your project settings and trigger your first headless crawl in minutes.