3.8% vs 0% — your sleep app may be making your insomnia worse

Among 500 reviews of one popular sleep-tracking app, 19 users — 3.8% — explicitly report that the app increased their anxiety or made their insomnia worse.

Across nine non-tracking sleep apps with 7,767 reviews combined, the same complaint appears in zero of them. The number isn't large in absolute terms. The contrast — 3.8% versus 0% — is what makes it diagnostic.

I. What clinicians named it

In 2017, Kelly Glazer Baron and colleagues at Rush University published a short paper in the Journal of Clinical Sleep Medicine with a deceptively casual title: "Orthosomnia: Are Some Patients Taking the Quantified Self Too Far?"

The term was new — ortho (correct, straight) + somnia (sleep) — and the cases they described were specific. Patients arriving at sleep clinics, not because they couldn't sleep, but because their wearable said they weren't sleeping enough. The patients felt fine. The numbers said they were broken. The numbers won.

The original paper documented three cases in detail. In each, the wearable's sleep-stage estimate (which is itself a fairly rough computation derived from heart rate and movement) became the reference point against which the patient measured their wellbeing. When the gadget said "poor sleep," the patient slept worse the next night, more anxious about their score. The act of measurement, in other words, was producing the symptom it claimed to track.

A 2023 follow-up from the same group, "The Tale of Orthosomnia", gave the phenomenon its current shape: a behavioral feedback loop in which sleep tracking shifts attention away from how a person feels and toward what their device says — and the latter, when bad, produces anxiety that feeds back into worse sleep.

II. The numbers, from two independent angles

Population prevalence. A 2024 cross-sectional study in Brain Sciences surveyed 523 general-population adults and found 35.8% used a sleep-tracking wearable regularly. Among those who did, a measurable subgroup met the proposed clinical criteria for orthosomnia (defined by a four-item algorithm in the paper). Critically, the orthosomnia-positive subgroup also scored worse on validated insomnia and anxiety scales than the non-orthosomnia tracker users — same device, different psychological relationship to it, measurably different sleep outcomes.

App-store evidence. A separate analysis of 9,921 App Store reviews across twelve sleep apps surfaces the asymmetry directly. In Rise (a sleep-debt-tracking app, ~10 million users, App Store rating 4.7), 19 of 500 sampled reviews — 3.8% — explicitly report increased anxiety or worsened sleep tied to the app. Eleven of those 19 reviews are 1- or 2-star.

For comparison, the same analysis pulled 7,767 reviews from nine non-tracking sleep apps (Calm, Headspace, Pzizz, BetterSleep, Tide, ShutEye, Sleep Cycle, Pillow, Hatch). The number of reviews in that set explicitly reporting "the app made my anxiety/insomnia worse" — zero.

Three Rise reviews tell the mechanism in users' own words:

"Increased my anxiety about not getting enough sleep with the daily nagging about my sleep debt." — ★1

"Only getting a guilty feeling from this app watching my sleep debt rise throughout the week." — ★3

"Making my sleep debt higher, and my energy levels lower, which is putting stress on myself which I think is the opposite that the app is meant to do." — ★2

That phrase — guilty feeling — recurs across the dataset. It's not a vague complaint about app quality. It's the orthosomnia mechanism described in clinical literature, surfacing on its own in user reviews of a consumer product.

III. Why precise numbers do this

A useful way to see the trap: imagine a heart-rate-variability monitor that tells you, before each work meeting, "Your stress level today is 42 out of 100." That number is precise. It feels actionable. But what do you do with 42?

If 42 is bad, you start the meeting tense about being tense. If 42 is fine, you start the meeting wondering whether 42 is actually fine. The number doesn't have to be wrong to make the situation worse — it just has to draw attention to a thing that runs better unattended.

Sleep is the same kind of thing. Sleep architecture is regulated by processes that work better when the brain isn't watching. The cortisol awakening response, NREM/REM cycling, the body-temperature rhythm — all of these run on autonomic schedules that aren't meant to be supervised. Bringing attention to them, especially through a precise daily score, recruits the prefrontal evaluative circuitry into a process the prefrontal cortex isn't supposed to run.

The act of measuring how you slept makes you the third-person observer of yourself. Sometimes that's fine. Sometimes — for sleep, in particular — it disrupts the thing being observed.

There is also a specific sub-mechanism with daily sleep-debt apps: the reset-resistance trap. A "sleep debt" number that resets only with consecutive good nights creates a target the user cannot hit on any single night. Every morning the number is bad on a screen, even when last night was fine. The app's framework keeps the user in a state of permanent insufficiency. Anxiety follows.

IV. Stage scoring is harder than the apps imply

There is a separate, technical problem with consumer trackers: they aren't very accurate at sleep-stage estimation in the first place.

Polysomnography — the gold-standard sleep-lab measurement — uses EEG, EMG, and EOG electrodes simultaneously to identify NREM stages 1, 2, and 3 plus REM. It requires trained technicians and direct measurement of brain electrical activity. Consumer wearables, by contrast, infer sleep stages from heart rate, heart-rate variability, and movement. Even the best of them have meaningful disagreement with PSG, especially for distinguishing N2 from N3 and REM from light sleep.

The World Sleep Society released a consensus in 2025 specifically advising clinicians to avoid over-interpreting consumer-device sleep-stage data when meeting patients, because the precision of the displayed numbers (e.g., "2h 14m of REM") significantly exceeds the underlying measurement accuracy. The number on the user's wrist looks definitive. The science behind it isn't.

Two implications follow:

A "bad" sleep-stage breakdown on a consumer device may not reflect a bad night.
A "good" one is also a softer signal than its precision suggests.

Either way, the device is asking the user to react to a measurement it cannot fully justify. That's a strange asymmetry to put at the center of a daily routine.

V. Tracking has uses — but narrower than the marketing implies

None of this means trackers are useless. The right use case is trend, not diagnosis:

Weekly trend over months is reliable enough to detect coarse changes — the kind of thing that confirms or refutes a behavior change you're testing (e.g., started exercising, changed wake time).
Identifying severe outliers — a wildly off night that signals illness or major disruption — also works fine. A device picking up a 50% drop in time-in-bed is a real signal, regardless of stage accuracy.

The wrong use cases — the ones the apps actively encourage and the ones with the highest risk of orthosomnia — are:

Comparing each night against the previous one. Nightly variance is enormous and mostly random. Reading meaning into it is reading meaning into noise.
Aiming at a specific number. Targeting "8 hours of sleep" or "1 hour of REM" or "zero sleep debt" turns a biological process into a performance target. Bodies don't perform.
Checking the app first thing in the morning. This is the highest-leverage moment for orthosomnia: the device's verdict arrives before any internal sense of how the night went, and overrides it.

A working heuristic: how the morning feels is more reliable than how the morning's score reads. If those two disagree, the data is the thing to doubt, not the body.

VI. What's next

The next essay (You may be taking 10× too much melatonin) takes a related cognition-evidence gap: most people use melatonin as a sleeping pill. It isn't. The 2024 meta-analysis on optimal dosing — and the 17–33× gap between common consumer doses and effective ones — is one of the cleanest examples of a remedy whose effect size has been overestimated by an entire order of magnitude.

If a tracker is part of your routine and is doing more harm than good, the simplest experiment available: disable the daily score for a week and only check the weekly trend. A surprising number of users report sleeping better immediately. That's not placebo — it's the absence of orthosomnia.