Back to scanner

What is CrumbTrail?

CrumbTrail is a page scanner for understanding which third-party services a website pulls in. It focuses on the external scripts, embeds, cookies, technologies, and findings that can be inferred from a single public page load.

The goal is visibility. Instead of reducing a site to one score, CrumbTrail shows the actual vendors, domains, and technical signals behind the page so you can inspect what is being loaded.

How the scan works

  1. 1You submit a public page URL to the scanner.
  2. 2The API validates the URL and blocks private or internal hosts before making a request.
  3. 3The server fetches the page HTML, follows redirects, and captures response headers and Set-Cookie values.
  4. 4Cheerio parses the HTML to extract external scripts, iframes, and network hints such as preconnect, dns-prefetch, and preload links.
  5. 5Third-party hostnames are compared against the tracker dataset and grouped into analytics, ads, behavior tracking, social, CDN/infrastructure, or unknown.
  6. 6The scan fingerprints technologies from HTML, headers, cookies, and known asset patterns, analyzes security headers, probes a short list of common paths, and extracts cookie names seen during the request.
  7. 7Results are shown as tracker groups, a domain map, detected technologies, security and privacy findings, and observed cookie names.

What the results mean

Tracker groups show which third-party services were referenced, where they came from, and how they were classified. The Domain Map complements that view by summarizing the external hostnames connected to the scanned page, while Tech Stack lists frameworks, platforms, servers, libraries, and other technologies inferred from the response.

Security Findings highlight missing headers, exposed paths, and privacy-related risks inferred from the detected services. The cookie section lists names observed during the request from response headers or inline document.cookie assignments so you can see which identifiers were exposed during the scan.

Limitations

The default scan is based on the initial HTML response and related metadata, so it should be treated as a static analysis pass rather than a full browser session. Trackers injected after page load, inside authenticated flows, or behind user interaction may be missed entirely.

Detection also depends on known hostname and fingerprint patterns, which means unknown, custom, or self-hosted tooling may appear as unknown or not be identified at all. Cookie results reflect names observed during the fetch, and the overall result represents a single page request rather than an entire site, logged-in experience, or multi-step journey.