Hardware Latency
Calibration

Our Testing Methodology: How We Subtract the Machine From the Measurement

"A reaction time test that doesn't account for hardware latency is measuring the device, not the human. We solve that."

Why Calibration Matters

Every cognitive test that runs in a browser is affected by hardware. The time between your brain deciding to click and the click being registered includes several layers of technical overhead: display latency (how long your monitor takes to render a frame), input latency (how long your mouse or trackpad takes to report the click), and browser rendering latency (how long the JavaScript event loop takes to process the response).

On a 60Hz laptop with a Bluetooth trackpad, the cumulative hardware overhead can be 80-120ms. On a 360Hz gaming monitor with a wired mouse, it can be as low as 10-15ms. That is a 70-100ms gap that has nothing to do with the user's brain — and it is larger than the entire difference between an average and an elite reaction time.

Without calibration, a person with genuinely faster neural processing on a slow laptop will score worse than a person with average processing on a high-end gaming setup. This is the fundamental unfairness that SENWITT's calibration system is designed to eliminate.

The Scale of the Problem

The average human reaction time to a visual stimulus is approximately 250ms. A 60Hz monitor introduces ~16.7ms of display latency per frame. A Bluetooth input device adds ~10-30ms. Browser rendering adds ~5-15ms. Total: 30-60ms of overhead that looks like a "slow brain" but is actually slow hardware.

How We Measure Device Latency

SENWITT uses a client-side calibration system implemented in the useDeviceCalibration hook. This runs automatically before any test session begins, requiring no user action. The process takes approximately one second and is invisible to the user.

FPS Detection

The calibration hook uses requestAnimationFrame to count the number of frames rendered over a 1-second window. This gives us the device's effective refresh rate — the actual frames-per-second the browser is producing, which accounts for both the monitor's native refresh rate and any performance throttling from the device's GPU or CPU load.

Input Lag Estimation

With the FPS measurement in hand, we estimate total system latency using a device-class model. The system uses the User-Agent string to distinguish mobile devices from desktop/laptop environments, then combines this with the measured FPS to assign a latency estimate:

Mobile: Touch latency is typically the bottleneck. Estimated at max(80ms, 1000/FPS).
High-Refresh (>100 FPS): Gaming-class hardware. Estimated at max(15ms, 1000/FPS).
Standard 60Hz: Desktop-class. Estimated at max(35ms, 1000/FPS).
Below 55 FPS: Likely a constrained laptop or older device. Estimated at max(60ms, 1000/FPS).

Source: hooks/useDeviceCalibration.ts — the actual calibration hook used in production.

The Calibration Algorithm

Once the device's estimated latency is known, SENWITT applies a calibration offset to raw test scores. The goal is to subtract the hardware overhead and arrive at the "human remainder" — the pure neurological processing time.

Baseline Normalization

Our population norms (used to calculate percentiles) were established with a baseline assumption of approximately 30ms system latency — a reasonable estimate for a standard desktop with a wired mouse and a 60Hz monitor. All calibration offsets are computed relative to this baseline.

Per-Test Application

Not all tests are affected by hardware latency equally. The calibration system applies different corrections depending on the test type:

Reaction Time

Full latency offset subtracted. If your device has 80ms estimated latency and the baseline is 30ms, 50ms is subtracted from your raw score. This is the most directly affected test. A floor of 50ms is enforced to prevent impossible scores.

Symbol Snap (Perceptual Speed)

Partial correction applied. Latency affects this test less directly (it measures match/mismatch decisions, not pure reaction time), so the offset is divided by 10 and added as a bonus to the count-based score.

Other Tests

Tests like Number Memory, Matrix, and Color Clash are minimally affected by input latency (they are accuracy-based, not time-critical), so no calibration offset is applied. The raw score is used directly.

Scoring Pipeline

Every test result passes through a five-stage scoring pipeline before it is displayed to the user or used in archetype computation.

Raw Score Capture

The test engine captures the raw performance metric: milliseconds for reaction time, digit count for number memory, words-per-minute for typing, score counts for accuracy tests, and so on.

Calibration Adjustment

If the device has been calibrated and the test type supports calibration, the hardware latency offset is applied to produce a calibrated score. Otherwise, the raw score passes through unchanged.

Percentile Calculation

The calibrated score is converted to a percentile using a sigmoid approximation of the cumulative distribution function. Each test type has its own population norm (mean and standard deviation). The z-score is computed, then mapped through a logistic function: p = 1 / (1 + e^(-1.7z)). For "inverse" tests (where lower is better, like reaction time), the z-score direction is flipped. Result is clamped to 1-99.

Domain Score Update

Each test maps to one of five cognitive domains: Reaction, Memory, Processing, Language, or Focus. The percentile is used to update the user's domain score — only if it exceeds the current value (we track your peak, not your average, for archetype assignment).

Composite Score (SENWITT Score)

The five domain scores are combined into a weighted composite: Reaction (25%) + Memory (25%) + Processing (20%) + Language (15%) + Focus (15%), then multiplied by 10 to produce a 0-1000 scale. This composite determines your league placement and global rank.

Statistical Methods

Why Median Over Mean

For timed tests (reaction time, aim trainer), SENWITT uses the median of trial responses rather than the mean. The reason is outlier resistance. A single anticipation error (clicking before the stimulus) or a distraction spike (checking your phone mid-test) can skew a mean by 50-100ms but leaves the median virtually untouched. The median gives us the truest measure of your "typical" performance.

Glicko-2 for Competitive Rating

For the competitive leaderboard system, SENWITT uses the Glicko-2 rating algorithm — the same system used in chess, esports, and competitive gaming platforms. Glicko-2 provides three values per user:

Rating (r): Your estimated skill level, starting at 1200.
Rating Deviation (RD): The system's confidence in your rating. Lower RD = higher confidence. Decays with inactivity.
Volatility: How much your performance fluctuates session-to-session. A stable player has low volatility.

The Glicko-2 system is superior to simple percentile rankings for competitive contexts because it accounts for the uncertainty of your rating. A player with a 1400 rating and low RD is more reliably strong than a player with a 1500 rating and high RD.

Percentile Normalization

Population norms for each test are pre-computed from aggregate performance data. The sigmoid approximation (logistic CDF) provides a smooth, continuous percentile mapping that avoids the discontinuities of empirical percentile tables. This means even extreme outlier scores (99th or 1st percentile) are handled gracefully.

Device Classes

The calibration system classifies every device into one of four cohorts. This classification serves two purposes: it determines the latency estimate used for score calibration, and it enables device-fair leaderboard filtering.

Mobile

Variable (30-120 FPS) · ~80ms+

Smartphones and tablets. Touch input introduces the highest latency due to the touch digitizer, OS processing, and typically lower refresh rates. The calibration system applies the largest offset for this class.

Laptop

Below 55 FPS · ~60ms

Older or lower-powered laptops, or devices running under heavy load. Includes integrated graphics systems that throttle frame rate. Moderate calibration offset applied.

Desktop (Standard)

55-100 FPS · ~35ms

Standard 60Hz desktop monitors with wired peripherals. This is the baseline device class — closest to the norms our population statistics were built from. Minimal calibration offset.

High-Refresh

100+ FPS · ~15ms

Gaming-class hardware: 144Hz, 240Hz, or 360Hz monitors with high-polling-rate mice. Lowest hardware overhead. These users get the smallest calibration benefit — their hardware is already close to zero-lag.

Device class is also stored in the user's calibration record, enabling the leaderboard to show rankings both globally (all devices) and within your device cohort (fair comparison against similar hardware).

Frequently Asked Questions

Does calibration make my score higher?

Not necessarily. Calibration makes your score more accurate. If you're on a high-latency device, your calibrated reaction time will be lower (faster) than your raw time, because the hardware overhead has been subtracted. If you're on a gaming setup with minimal latency, calibration will barely change your score — your raw time was already close to your true neurological response time.

Can I game the system by faking a high-latency device?

The calibration is measured, not self-reported. It uses actual frame timing from your browser's rendering engine. You cannot fake a lower FPS without actually throttling your device, which would simultaneously degrade your test performance. The system is self-correcting.

Why don't you calibrate all tests?

Only tests where the measured outcome is directly affected by input/display latency receive calibration. Accuracy-based tests (like Number Memory or Matrix) are not meaningfully impacted by a few extra milliseconds of display lag — you're recalling digits or identifying patterns, not racing a clock to the millisecond.

How accurate is the latency estimation?

The FPS-based model is an approximation, not a direct measurement of end-to-end input latency. True input latency measurement would require specialized hardware (high-speed cameras and photodiodes). Our model is accurate to within approximately +/-15ms for most device classes, which is sufficient to eliminate the large-scale hardware unfairness (60-100ms gaps) while accepting minor residual variation.

What is the 'human remainder'?

The human remainder is our term for the portion of your reaction time that is purely biological — after hardware overhead has been subtracted. It represents the actual time your visual cortex takes to detect the stimulus, your motor cortex takes to plan the response, and your muscles take to execute it. This is what we're trying to measure.

Do you recalibrate between tests?

The calibration runs once when the testing session begins and persists for that session. Since hardware conditions rarely change within a single browsing session, re-running calibration between tests would add latency without improving accuracy.

Where can I learn more about your scoring algorithms?

The percentile calculation, domain mapping, archetype assignment, and composite scoring algorithms are all documented in the codebase. For a deeper dive into the science behind our tests, visit the science hub or explore the individual test pages for methodology-specific details.