Hardware Latency Calibration

Our Testing Methodology: How We Subtract the Machine From the Measurement

"A reaction time test that doesn’t account for hardware latency is measuring the device, not the human. We solve that."

Why Calibration Matters

Every cognitive test that runs in a browser is affected by hardware. The time between your brain deciding to click and the click being registered includes several layers of technical overhead: display latency (how long your monitor takes to render a frame), input latency (how long your mouse or trackpad takes to report the click), and browser rendering latency (how long the JavaScript event loop takes to process the response).

On a 60Hz laptop with a Bluetooth trackpad, the cumulative hardware overhead can be 80–120ms. On a 360Hz gaming monitor with a wired mouse, it can be as low as 10–15ms. That is a 70–100ms gap that has nothing to do with the user’s brain — and it is larger than the entire difference between an average and an elite reaction time.

Without calibration, a person with genuinely faster neural processing on a slow laptop will score worse than a person with average processing on a high-end gaming setup. This is the fundamental unfairness that Senwitt’s calibration system is designed to eliminate.

The average human reaction time to a visual stimulus is approximately 250ms. A 60Hz monitor introduces ~16.7ms of display latency per frame. A Bluetooth input device adds ~10–30ms. Browser rendering adds ~5–15ms. Total: 30–60ms of overhead that looks like a “slow brain” but is actually slow hardware.

How We Measure Device Latency

Senwitt uses a client-side calibration system implemented in the `useDeviceCalibration` hook. This runs automatically before any test session begins, requiring no user action. The process takes approximately one second and is invisible to the user.

FPS detection: the calibration hook uses `requestAnimationFrame` to count the number of frames rendered over a 1-second window. This gives us the device’s effective refresh rate — the actual FPS the browser is producing, which accounts for both the monitor’s native refresh rate and any performance throttling from GPU/CPU load.

Input lag estimation: with the FPS measurement in hand, we estimate total system latency using a device-class model. The system uses the User-Agent to distinguish mobile devices from desktop/laptop environments, then combines this with the measured FPS to assign a latency estimate.

Mobile: touch latency is typically the bottleneck. Estimated at max(80ms, 1000/FPS).

High-refresh (>100 FPS): gaming-class hardware. Estimated at max(15ms, 1000/FPS).

Standard 60Hz: desktop-class. Estimated at max(35ms, 1000/FPS).

Below 55 FPS: likely a constrained laptop or older device. Estimated at max(60ms, 1000/FPS).

Source: `hooks/useDeviceCalibration.ts` — the actual calibration hook used in production.

The Calibration Algorithm

Once the device’s estimated latency is known, Senwitt applies a calibration offset to raw test scores. The goal is to subtract the hardware overhead and arrive at the “human remainder” — the pure neurological processing time.

Baseline normalization: our population norms (used to calculate percentiles) were established with a baseline assumption of approximately 30ms system latency — a reasonable estimate for a standard desktop with a wired mouse and a 60Hz monitor. All calibration offsets are computed relative to this baseline.

Per-test application: not all tests are affected by hardware latency equally. The calibration system applies different corrections depending on the test type.

Reaction Time: full latency offset subtracted. If your device has 80ms estimated latency and the baseline is 30ms, 50ms is subtracted from your raw score. A floor of 50ms is enforced to prevent impossible scores.

Symbol Snap (Perceptual Speed): partial correction applied. Latency affects this test less directly, so the offset is divided by 10 and added as a bonus to the count-based score.

Other tests: Number Memory, Matrix, and Color Clash are minimally affected by input latency (accuracy-based, not time-critical), so no calibration offset is applied. The raw score is used directly.

Scoring Pipeline

Every test result passes through a five-stage scoring pipeline before it is displayed to the user or used in archetype computation.

1) Raw score capture: the test engine captures the raw performance metric (milliseconds for reaction time, digit count for number memory, words-per-minute for typing, score counts for accuracy tests, and so on).

2) Calibration adjustment: if the device has been calibrated and the test type supports calibration, the hardware latency offset is applied to produce a calibrated score. Otherwise, the raw score passes through unchanged.

3) Percentile calculation: the calibrated score is converted to a percentile using a sigmoid approximation of the cumulative distribution function. Each test type has its own population norm (mean and standard deviation). The z-score is computed, then mapped through a logistic function: `p = 1 / (1 + e^(-1.7z))`. For inverse tests (where lower is better, like reaction time), the z-score direction is flipped. Result is clamped to 1–99.

4) Domain score update: each test maps to one of five cognitive domains (Reaction, Memory, Processing, Language, Focus). The percentile updates your domain score only if it exceeds the current value (we track your peak, not your average, for archetype assignment).

5) Composite score (Senwitt Score): the five domain scores are combined into a weighted composite: Reaction (25%) + Memory (25%) + Processing (20%) + Language (15%) + Focus (15%), then multiplied by 10 to produce a 0–1000 scale. This composite determines your league placement and global rank.

Statistical Methods

Why median over mean: for timed tests (reaction time, aim trainer), Senwitt uses the median of trial responses rather than the mean. A single anticipation error or a distraction spike can skew a mean by 50–100ms but leaves the median virtually untouched. The median gives us the truest measure of your “typical” performance.

Glicko-2 for competitive rating: for the competitive leaderboard system, Senwitt uses the Glicko-2 rating algorithm (used in chess and esports). Glicko-2 provides Rating (r), Rating Deviation (RD), and Volatility.

Percentile normalization: population norms for each test are pre-computed from aggregate performance data. The sigmoid approximation (logistic CDF) provides a smooth, continuous percentile mapping that avoids discontinuities of empirical percentile tables and handles extreme outliers gracefully.

Device Classes

The calibration system classifies every device into one of four cohorts. This classification determines the latency estimate used for score calibration and enables device-fair leaderboard filtering.

Mobile: Variable (30–120 FPS), ~80ms+. Touch input introduces the highest latency. Largest offset applied.

Laptop: Below 55 FPS, ~60ms. Older or constrained devices, often with integrated graphics throttling. Moderate offset applied.

Desktop (Standard): 55–100 FPS, ~35ms. Baseline device class. Minimal offset applied.

High-refresh: 100+ FPS, ~15ms. Gaming-class hardware. Smallest calibration benefit (hardware already close to zero-lag).

Device class is stored in the user’s calibration record, enabling the leaderboard to show rankings globally and within your cohort.

Frequently Asked Questions

Does calibration make my score higher? Not necessarily. Calibration makes your score more accurate. On a high-latency device, calibrated reaction time will be lower (faster) than raw time because hardware overhead is subtracted. On a low-latency gaming setup, calibration barely changes the score.

Can I game the system by faking a high-latency device? Calibration is measured, not self-reported. It uses actual frame timing. You cannot fake a lower FPS without actually throttling the device, which would also degrade performance.

Why don’t you calibrate all tests? Only tests where the measured outcome is directly affected by input/display latency receive calibration. Accuracy-based tests are not meaningfully impacted by a few extra milliseconds of display lag.

How accurate is the latency estimation? The FPS-based model is an approximation, not a direct measurement of end-to-end input latency. It’s accurate to within ~±15ms for most device classes, which is sufficient to eliminate 60–100ms hardware unfairness while accepting minor residual variation.

What is the “human remainder”? The portion of reaction time that is purely biological after hardware overhead has been subtracted — the time to detect the stimulus, plan the response, and execute it.

Do you recalibrate between tests? Calibration runs once when the testing session begins and persists for that session. Re-running between tests would add latency without improving accuracy.

Where can I learn more about your scoring algorithms? Percentile calculation, domain mapping, archetype assignment, and composite scoring are documented in the codebase and linked throughout the Science section.