ConsentLens Scanning Methodology
ConsentLens scans websites using a headless Chromium browser controlled by Playwright, configured to simulate an EU user visiting a site for the first time. The scanner uses passive observation — it does not click, scroll, or interact with the page — to capture the natural default state of the website before any user interaction. This document describes the full technical methodology: how the browser is configured, how the pre-consent detection window works, how tracker confidence is calculated, and how compliance issues are generated from the raw data.
Scanner Architecture Overview
The scanner consists of a Next.js API route that initiates the scan, a Playwright browser automation layer that performs the page load and data collection, and a post-processing analysis layer that converts raw browser data into compliance scores and issues. Scans are initiated asynchronously — the API returns a scan ID immediately, and the browser work proceeds in the background. The client polls for results at two-second intervals.
Each scan uses a fresh browser context — no cookies, no cached data, no session state from previous scans. This ensures that every scan represents a genuine first visit from a clean browser state, which is the most privacy-relevant scenario. A website's GDPR compliance should be evaluated based on what happens to a new visitor, not a returning user who has already interacted with the site.
The scanner has a hard 15-second timeout on page load. If the page does not load within this window, the scan is marked as failed with an error reason. This timeout is intentionally conservative — it prevents infinite waits on sites with excessive loading dependencies while providing enough time for most production websites to complete their initial render and fire all on-load scripts.
EU Browser Context: Locale and Geolocation
All scans are performed with the browser context configured to simulate a user in Berlin, Germany: locale 'de-DE', timezone 'Europe/Berlin', and browser geolocation set to Berlin coordinates (52.5200° N, 13.4050° E). This configuration is deliberately chosen to trigger EU-specific consent banners and cookie notices that many websites serve only to visitors from EU IP addresses or with EU-regional browser settings.
Without this configuration, a scanner running from a US or non-EU server would often receive a version of the website without a cookie notice, because the site is geolocation-gating its consent mechanism to EU users only. This would produce a false-negative result — the scanner would find no banner and no consent mechanism, but not because the site is non-compliant, merely because the compliance mechanism is hidden from non-EU requests. EU locale and geolocation ensures the scan captures the experience of the users the regulation protects.
The Accept-Language header is set to 'de-DE,de;q=0.9,en;q=0.8' to further signal an EU user context. Some sites serve different cookie policy implementations based on this header in combination with geolocation. The browser's navigator.language property also returns the configured locale, which some CMP implementations use to determine which consent UI to display.
Passive Observation Principle
The scanner observes pages passively — it does not click, scroll, fill forms, or interact with the page in any way. This design decision reflects the legal context of the scanner's purpose: GDPR and ePrivacy require that tracking consent be obtained before processing begins, not before some user interaction. A page that fires trackers on load, before any click, is violating this requirement regardless of what would happen if a user scrolled or clicked.
Active interaction — for example, clicking an accept button on the consent banner — would alter the measured state. If the scanner accepted cookies and then measured tracker behaviour, it would be measuring post-consent behaviour, which is not the compliance-relevant scenario. The compliance question is: what happens to a user who arrives at this page and has not yet made any consent decision?
Passive observation also means the scanner does not attempt to detect what a site would do differently after consent is given or withdrawn. The scan result reflects the pre-interaction state of the page. Some sites may load additional trackers after consent is given — this is expected and generally compliant. Some sites may continue to load the same trackers regardless of consent decisions — this is non-compliant, but detecting it would require interactive testing that is outside the scope of a first-visit scan.
The 500ms Pre-Consent Detection Window
The scanner records a timestamp called 'pageLoadStartTime' at the moment the page navigation begins. All subsequent network requests are timestamped relative to this anchor. A tracker is classified as 'fired before consent' if its first data-transmission network request (type: fetch, xhr, ping, or other — not script, document, stylesheet, font, or media) occurs within the first 500 milliseconds of page load.
The 500ms threshold is based on the practical impossibility of a user reading, evaluating, and interacting with a consent banner in less than half a second. In testing across hundreds of real consent banner implementations, the fastest humanly achievable consent interaction requires approximately 1.2–1.8 seconds from page render to button click. The 500ms window therefore captures the full population of requests that could not have been consented to, with a safety margin.
This threshold is conservative in the correct direction: it may miss some pre-consent tracking that occurs between 500ms and 1.5 seconds (the user is unlikely to have consented in this window, but the scanner does not flag it). It will not produce false positives — a tracker firing at 450ms could not plausibly have waited for consent. DPAs in France, the Netherlands, and Germany have used similar timing-based analysis in enforcement investigations.
Tracker Detection: The Four-Pass Confidence Model
Tracker detection uses a four-pass model that assigns confidence weights to different detection signals, aggregates them, and classifies trackers based on the resulting confidence score. The four passes are: network domain matching (weight 0.6), window global variable detection (weight 0.5), script source URL pattern matching (weight 0.3), and inline script snippet matching (weight 0.2). A single confident network-domain match produces a confidence of 0.6 — above the 0.3 threshold for 'probable' but below the 0.6 threshold for 'confirmed'.
Network domain matching is the highest-weight signal because a data-transmission request to a known tracker domain (google-analytics.com, connect.facebook.net, static.hotjar.com, doubleclick.net, etc.) is the strongest available evidence that the tracker is actively collecting data. Window global detection — the presence of objects like window.ga, window.fbq, or window._hjSettings — confirms that a tracker's script has loaded and initialised. These two signals together produce a confidence of 1.1, capped at 1.0, which is a confirmed detection.
An inline-only detection penalty of -0.15 is applied to trackers identified only through inline script pattern matching without any network domain or global variable confirmation. This reduces false-positive rates for code that references tracker patterns (such as documentation pages or code-sharing sites) without actually running those trackers. The final confidence score is capped at 1.0. Trackers with confidence at or above 0.6 are classified as confirmed; those between 0.3 and 0.6 are classified as probable. Detections below 0.3 are discarded.
Issue Generation and Severity Classification
The scanner generates compliance issues in eight categories based on the collected data. Each issue is structured with five required fields: severity (High/Medium/Low), title, description, fix (a specific code-level or configuration instruction), and evidence (the exact cookie name or network request URL that triggered the issue). Issues without evidence from the actual scan are not generated — the evidence field is a hard requirement.
High-severity issues are generated for: trackers firing before consent, undisclosed third-party cookies, and absence of any consent mechanism alongside detected tracking. These map to the most serious GDPR violations — processing personal data without a valid lawful basis. Medium-severity issues cover: consent banners that lack a reject option (requiring consent to be as easy to withhold as to give), tag managers loaded without an associated CMP, and cookies with excessive retention periods. Low-severity issues cover: first-party analytics cookies with long expiry times, and missing category disclosure in cookie notices.
Fix instructions are calibrated to the specific implementation detected. For a site using Google Tag Manager where a tracker fires before consent, the fix specifies GTM configuration steps — not generic advice about 'considering consent management'. For a site with a known CMP where the banner lacks a reject button, the fix specifies the CMP-specific configuration path to enable the reject option. This specificity is the primary differentiator of ConsentLens issue quality.
Data Handling and Scan Retention
The scanner collects and stores: the list of cookies set during the page load (names, domains, expiry, and security attributes), the network request log (URLs, resource types, and timestamps), tracker detections with confidence scores and pre-consent flags, consent banner detection results, and a screenshot of the page at load completion. No user-provided personal data from the scanned website is stored — the scanner observes and records the website's behaviour, not the content of the pages.
Scan results are stored in the ConsentLens database and associated with either an authenticated user account (for logged-in users) or treated as anonymous (for unauthenticated scans). For the public SEO pages at /scan/[domain], the stored data is the basis of the published compliance summary. Domain-level scan data is publicly accessible to support the platform's research and transparency mission.
Screenshot capture records the visual state of the page at the end of the page load phase. Screenshots are stored and displayed in the scan report to provide context for the detected issues — reviewers can see exactly what the page looked like, including any consent banner that was present, at the time of the scan. Screenshot data is not shared with third parties and is stored in the ConsentLens infrastructure.
Frequently Asked Questions
Why does ConsentLens not click the consent banner?
Could a site pass a ConsentLens scan by detecting the scanner and hiding trackers?
How often should I scan my website?
See real scan data
View live compliance reports for websites ConsentLens has already scanned:
Related guides
How ConsentLens Compliance Scores Are Calculated
ConsentLens scores websites 0–100 across four dimensions. Learn what each sub-score measures, how the point values are assigned, and what your score means legally.
What Is Pre-Consent Tracking?
Pre-consent tracking is one of the most common and serious GDPR violations. Learn what it is, how ConsentLens detects it, and how to fix it on your website.
The Complete GDPR Guide for Website Owners
Everything website owners need to know about GDPR: lawful bases, consent requirements, data subject rights, fines, and how to audit your own site.