Authentication

Every request is authenticated with a bearer token. API keys are prefixed by environment — cx_live_… for production and cx_test_… for sandboxed testing — and are sent in the Authorization header. Generate a key from your project settings; treat it like a password and never expose it in client-side code.

curl https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e \
  -H "Authorization: Bearer cx_live_8f3c2a9d4b6e1f0a7c5d2e9b4a8f1c6d"

Keys carry one of two scopes, so you can hand out read-only access without granting write:

read — read-only access: poll crawl status, list issues and pages, and fetch or export reports.
crawl-trigger — a superset of read that can also start crawls.

A missing or invalid key returns 401. Requests for a project or resource your key cannot access return 404 rather than 403 — we never confirm the existence of resources outside your access.

Base URL

All endpoints are versioned and live under a single base URL. Paths below are relative to it.

https://api.crawlx.ai/v1

Endpoints

Trigger a crawl

POST /projects/{id}/crawls — start a crawl for a project. Requires the crawl-trigger scope. Triggering is asynchronous: a successful call returns 202 Accepted with a crawl_id you then poll. Pass an Idempotency-Key header to make retries safe — replaying the same key returns the original crawl instead of starting a new one.

curl -X POST https://api.crawlx.ai/v1/projects/p_1a2b3c4d/crawls \
  -H "Authorization: Bearer cx_live_…" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: 9d8c7b6a-5e4f-3a2b-1c0d-9e8f7a6b5c4d"

# 202 Accepted
{
  "crawl_id": "c_2f7a9e1b4d6c8a0e",
  "status": "queued"
}

Get crawl status

GET /crawls/{id} — poll a crawl. Returns its status, progress, and summary counts. Requires the read scope.

curl https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e \
  -H "Authorization: Bearer cx_live_…"

# 200 OK
{
  "crawl": {
    "id": "c_2f7a9e1b4d6c8a0e",
    "status": "running",
    "progress": 0.62,
    "pages_crawled": 3104,
    "issues_found": 87
  }
}

List issues

GET /crawls/{id}/issues — list the issues found in a crawl, ordered by traffic impact. Supports limit and cursor query params for pagination. Requires the read scope.

curl "https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e/issues?limit=2" \
  -H "Authorization: Bearer cx_live_…"

# 200 OK
{
  "issues": [
    {
      "id": "i_7c1d9a3e",
      "type": "broken_internal_link",
      "severity": "high",
      "impact_score": 92,
      "pages_affected": 211
    },
    {
      "id": "i_4b8f2c6a",
      "type": "bad_canonical",
      "severity": "medium",
      "impact_score": 64,
      "pages_affected": 538
    }
  ],
  "next_cursor": "eyJvIjoyfQ"
}

List pages

GET /crawls/{id}/pages — list the pages crawled, with their status code and timing. Paginates the same way as issues. Requires the read scope.

curl "https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e/pages?limit=2" \
  -H "Authorization: Bearer cx_live_…"

# 200 OK
{
  "pages": [
    {
      "url": "https://example.com/products/trail-runner-gtx",
      "status_code": 200,
      "response_ms": 240,
      "issues": 0
    },
    {
      "url": "https://example.com/products/sku-2019-retired",
      "status_code": 404,
      "response_ms": 1100,
      "issues": 1
    }
  ],
  "next_cursor": "eyJvIjoyfQ"
}

Export report

GET /crawls/{id}/report — fetch or export a report. Use format=json (default) or format=csv, and type=issues or type=pages to select the dataset. Requires the read scope.

curl "https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e/report?format=csv&type=issues" \
  -H "Authorization: Bearer cx_live_…" \
  -o issues.csv

Webhooks

crawl.completed — a crawl finished successfully; data carries the crawl summary.
crawl.failed — a crawl ended in error; data carries the failure reason.
issue.found — a new issue was detected during a crawl.

Every delivery uses the same envelope. The event_id is stable across retries — use it to dedupe.

POST https://your-app.example.com/webhooks/crawlx
X-CrawlX-Event:      issue.found
X-CrawlX-Event-Id:   ev_5b3a1c9d7e2f4a6b
X-CrawlX-Signature:  t=1779743400,v1=4f1d…a9c2

{
  "event": "issue.found",
  "event_id": "ev_5b3a1c9d7e2f4a6b",
  "delivered_at": "2026-05-26T12:34:56.000Z",
  "project_id": "p_1a2b3c4d",
  "data": {
    "crawl_id": "c_2f7a9e1b4d6c8a0e",
    "issue_id": "i_7c1d9a3e",
    "type": "broken_internal_link",
    "severity": "high"
  }
}

Verifying the signature

Each delivery is signed with the webhook's secret. The X-CrawlX-Signature header has the form t=<unix>,v1=<hmac>. The signature is an HMAC-SHA256 over the string `${t}.${body}` — the timestamp, a literal dot, then the exact raw request body. Verify on the raw body before parsing JSON, compare in constant time, and reject any request whose timestamp is more than 300 seconds from now to defeat replays.

import { createHmac, timingSafeEqual } from "node:crypto";

const TOLERANCE_SECONDS = 300;

// body must be the RAW request body string (not re-serialized JSON).
function verifyWebhook(secret, body, header, now = Math.floor(Date.now() / 1000)) {
  const parts = Object.fromEntries(
    header.split(",").map((kv) => kv.split("=").map((s) => s.trim())),
  );
  const t = Number(parts.t);
  const sig = parts.v1;
  if (!t || !sig) return false;

  // Replay guard: reject stale timestamps (±5 minutes).
  if (Math.abs(now - t) > TOLERANCE_SECONDS) return false;

  // Recompute the HMAC and compare in constant time.
  const expected = createHmac("sha256", secret)
    .update(t + "." + body)
    .digest("hex");

  const a = Buffer.from(sig);
  const b = Buffer.from(expected);
  return a.length === b.length && timingSafeEqual(a, b);
}

Rate limits

Requests are rate-limited per API key. When you exceed the limit the API responds with 429 Too Many Requests and a Retry-After header (in seconds) telling you when to retry. Back off and respect it.

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{ "error": "rate_limited", "retry_after": 30 }

Authentication

curl https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e \
  -H "Authorization: Bearer cx_live_8f3c2a9d4b6e1f0a7c5d2e9b4a8f1c6d"

Keys carry one of two scopes, so you can hand out read-only access without granting write:

read — read-only access: poll crawl status, list issues and pages, and fetch or export reports.
crawl-trigger — a superset of read that can also start crawls.

A missing or invalid key returns 401. Requests for a project or resource your key cannot access return 404 rather than 403 — we never confirm the existence of resources outside your access.

Base URL

All endpoints are versioned and live under a single base URL. Paths below are relative to it.

https://api.crawlx.ai/v1

Endpoints

Trigger a crawl

curl -X POST https://api.crawlx.ai/v1/projects/p_1a2b3c4d/crawls \
  -H "Authorization: Bearer cx_live_…" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: 9d8c7b6a-5e4f-3a2b-1c0d-9e8f7a6b5c4d"

# 202 Accepted
{
  "crawl_id": "c_2f7a9e1b4d6c8a0e",
  "status": "queued"
}

Get crawl status

GET /crawls/{id} — poll a crawl. Returns its status, progress, and summary counts. Requires the read scope.

curl https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e \
  -H "Authorization: Bearer cx_live_…"

# 200 OK
{
  "crawl": {
    "id": "c_2f7a9e1b4d6c8a0e",
    "status": "running",
    "progress": 0.62,
    "pages_crawled": 3104,
    "issues_found": 87
  }
}

List issues

GET /crawls/{id}/issues — list the issues found in a crawl, ordered by traffic impact. Supports limit and cursor query params for pagination. Requires the read scope.

curl "https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e/issues?limit=2" \
  -H "Authorization: Bearer cx_live_…"

# 200 OK
{
  "issues": [
    {
      "id": "i_7c1d9a3e",
      "type": "broken_internal_link",
      "severity": "high",
      "impact_score": 92,
      "pages_affected": 211
    },
    {
      "id": "i_4b8f2c6a",
      "type": "bad_canonical",
      "severity": "medium",
      "impact_score": 64,
      "pages_affected": 538
    }
  ],
  "next_cursor": "eyJvIjoyfQ"
}

List pages

GET /crawls/{id}/pages — list the pages crawled, with their status code and timing. Paginates the same way as issues. Requires the read scope.

curl "https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e/pages?limit=2" \
  -H "Authorization: Bearer cx_live_…"

# 200 OK
{
  "pages": [
    {
      "url": "https://example.com/products/trail-runner-gtx",
      "status_code": 200,
      "response_ms": 240,
      "issues": 0
    },
    {
      "url": "https://example.com/products/sku-2019-retired",
      "status_code": 404,
      "response_ms": 1100,
      "issues": 1
    }
  ],
  "next_cursor": "eyJvIjoyfQ"
}

Export report

GET /crawls/{id}/report — fetch or export a report. Use format=json (default) or format=csv, and type=issues or type=pages to select the dataset. Requires the read scope.

curl "https://api.crawlx.ai/v1/crawls/c_2f7a9e1b4d6c8a0e/report?format=csv&type=issues" \
  -H "Authorization: Bearer cx_live_…" \
  -o issues.csv

Webhooks

crawl.completed — a crawl finished successfully; data carries the crawl summary.
crawl.failed — a crawl ended in error; data carries the failure reason.
issue.found — a new issue was detected during a crawl.

Every delivery uses the same envelope. The event_id is stable across retries — use it to dedupe.

POST https://your-app.example.com/webhooks/crawlx
X-CrawlX-Event:      issue.found
X-CrawlX-Event-Id:   ev_5b3a1c9d7e2f4a6b
X-CrawlX-Signature:  t=1779743400,v1=4f1d…a9c2

{
  "event": "issue.found",
  "event_id": "ev_5b3a1c9d7e2f4a6b",
  "delivered_at": "2026-05-26T12:34:56.000Z",
  "project_id": "p_1a2b3c4d",
  "data": {
    "crawl_id": "c_2f7a9e1b4d6c8a0e",
    "issue_id": "i_7c1d9a3e",
    "type": "broken_internal_link",
    "severity": "high"
  }
}

Verifying the signature

import { createHmac, timingSafeEqual } from "node:crypto";

const TOLERANCE_SECONDS = 300;

// body must be the RAW request body string (not re-serialized JSON).
function verifyWebhook(secret, body, header, now = Math.floor(Date.now() / 1000)) {
  const parts = Object.fromEntries(
    header.split(",").map((kv) => kv.split("=").map((s) => s.trim())),
  );
  const t = Number(parts.t);
  const sig = parts.v1;
  if (!t || !sig) return false;

  // Replay guard: reject stale timestamps (±5 minutes).
  if (Math.abs(now - t) > TOLERANCE_SECONDS) return false;

  // Recompute the HMAC and compare in constant time.
  const expected = createHmac("sha256", secret)
    .update(t + "." + body)
    .digest("hex");

  const a = Buffer.from(sig);
  const b = Buffer.from(expected);
  return a.length === b.length && timingSafeEqual(a, b);
}

Rate limits

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{ "error": "rate_limited", "retry_after": 30 }

CrawlX API

Jump to a section

Authentication

Endpoints

Webhooks

Authentication

Base URL

Endpoints

Trigger a crawl

Get crawl status

List issues

List pages

Export report

Webhooks

Verifying the signature

Rate limits

Build on CrawlX.

CrawlX API

Jump to a section

Authentication

Endpoints

Webhooks

Authentication

Base URL

Endpoints

Trigger a crawl

Get crawl status

List issues

List pages

Export report

Webhooks

Verifying the signature

Rate limits

Build on CrawlX.