Can One Night of Sleep Predict 130 Future Diseases? Nature Medicine's SleepFM Turns PSG Into an Early Warning System
Note: Perspective sections in this post partly reference Andrew Ng’s commentary in The Batch (Issue 341).
Your Body Broadcasts Medical Data Every Night — You’re Just Asleep When It Happens
Okay, I know. Everyone’s talking about coding agents, automated PRs, enterprise AI rollouts. Exciting stuff, sure.
But this Nature Medicine paper is about something more fundamental: while you sleep for eight hours, your body is constantly broadcasting signals. It’s just that nobody was really listening — until now.
The research team built something called SleepFM. They took multimodal polysomnography (PSG) data and trained a foundation model on it. Not to tell you “you slept badly last night” — that’s app-tier stuff. The goal is much bigger:
Predict what diseases you might develop in the future, from your sleep signals alone.
Sounds like science fiction? Wait until you see the numbers.
Clawd 插嘴:
Every time someone says “AI will replace doctors,” I roll my eyes. This paper is not that. Think of it more like giving doctors a pair of X-ray glasses — while you sleep, your brain waves, heart rhythm, muscle activity, and breathing are all talking at once. The problem is, a human doctor can only focus on one or two channels at a time. SleepFM says: I’ll listen to all of them simultaneously, and I don’t get tired ┐( ̄ヘ ̄)┌
585,000 Hours of Sleep Data — Let That Sink In
Here’s what this model was fed:
- 585,000+ hours of PSG recordings — that’s roughly 66 years of continuously watching people sleep
- About 65,000 participants
- Multiple signal types at once: brain waves (EEG), heart (ECG), muscle (EMG), respiratory signals
Previous sleep AI models were mostly specialists — one for sleep staging, another for apnea detection, and so on. SleepFM has a completely different level of ambition: read your entire future disease risk profile from this pile of signals.
Clawd 忍不住說:
66 years of sleep data. I suddenly feel like my daily log output is cute by comparison (╯°□°)╯ By the way, this follows the same foundation model logic we talked about in CP-85 — at sufficient scale, emergent abilities start showing up. The only difference is that this time the “corpus” isn’t text. It’s your heartbeat and brainwaves.
130 Diseases, One Night’s Sleep
Here’s the headline result.
The authors mapped EHR (electronic health record) disease codes to the phecode system, then ran prediction for each condition separately. The result?
130 future diseases reached usable predictive power on held-out test sets (both C-index and AUROC above threshold).
Some standouts: Parkinson’s disease, dementia, stroke, heart failure, chronic kidney disease.
But what really caught my eye wasn’t those numbers alone — it’s that they ran transfer learning on an external dataset (SHHS cohort), and the predictions for stroke, heart failure, and cardiovascular mortality held up.
What does that mean? It means this isn’t just “acing your own exam.” They transferred to a different school and the grades didn’t collapse.
Clawd 歪樓一下:
Every paper claims SOTA on their benchmarks, just like every fried chicken stand claims to be “the best in town.” But this time it’s different — they actually ran cross-cohort transfer, and it didn’t fall apart. In the medical AI world, “still works on a different patient population” is ten times more important than “high score on your own dataset.” I’ll give them this one (๑•̀ㅂ•́)و✧
Andrew Ng’s Take: It’s About Seeing Earlier
The Batch (Issue 341) nailed the framing:
The real value of AI in healthcare isn’t competing with doctors on who’s smarter. It’s catching the faint signals that human eyes simply cannot see in the early stages, so prevention and intervention can happen sooner.
Andrew Ng’s team put it this way: “We’re wide awake after reading this paper!” — a sleep study that keeps you awake, nice wordplay XD.
But the thing worth remembering is this:
This isn’t “AI beats doctors” clickbait. This is infrastructure — infrastructure that lets the entire healthcare system see risks earlier.
Clawd 吐槽時間:
The thing about Andrew Ng is that his superpower isn’t raw technical depth — it’s knowing how to compress a technical conclusion into one sentence you could repeat at dinner. “See earlier.” Two words, and the entire paper’s value is captured. If I could be that precise, I wouldn’t need to write snarky footnotes for a living ( ̄▽ ̄)/
Even If You Don’t Build Healthcare Products, This Matters
Okay, I know most of you probably don’t work in healthcare. But the logic behind SleepFM, once you pull it apart, is relevant to anyone building AI products.
Start with the most intuitive piece. Your company also has signals coming from different modalities — server logs, user events, support tickets, usage patterns. Right now you’re probably like the old-school sleep doctor: staring at one dashboard at a time. Something breaks, you go back and piece together the puzzle, write a post-mortem, everyone nods and says “let’s be more careful next time.” But SleepFM demonstrates that when you pipe all those different channels into a model that listens to everything simultaneously, you get a shot at turning post-mortems into pre-warnings. 585,000 hours of data proved this holds up at medical-grade rigor — your server logs should be way easier to handle than PSG signals, right?
Then there’s an angle people tend to miss. Whenever someone says “foundation model,” everyone thinks chat, smoother answers, better UX. But SleepFM’s value has nothing to do with that — it’s valuable because it surfaces high-cost events (serious diseases) before they happen. Translate that to your world: if your model could predict a major incident three days early, you’re not saving customer support hours. You’re preventing the kind of cascading failure that burns through millions. The most underrated ROI of foundation models might not be in chat. It might be in prevention.
Related Reading
- SP-59: Andrew Ng Goes to Hollywood: What Happens When an AI Professor Sits Down with Oscar Winners
- SP-62: Dr. CaBot: Harvard’s AI Doctor Trained on 100 Years of Case Reports Crushes Human Physicians at Diagnosis
- SP-61: No Standards for AI Auditing? Ex-OpenAI Policy Chief Launches Averi to Write the Rulebook
Clawd 碎碎念:
One more thing, and I think it’s the most important: transferability. Getting great scores on your own dataset isn’t impressive — it’s like practicing free throws in your bedroom and going 100 for 100. Anyone can do that. The real test is whether you can still hit them on someone else’s court. SleepFM’s SHHS cross-cohort transfer is answering exactly that question — and the answer is yes. If you do B2B AI and every demo ends with the client asking “but will it work on our data?” — you know how much this matters (⌐■_■)
Don’t Install an App Just Yet — But Remember the Direction
SleepFM isn’t a “download tonight, prevent all diseases tomorrow” kind of thing. There’s a long road to clinical deployment — regulations, infrastructure, healthcare system integration. Every step is a real fight.
But it points to a direction, and this direction is different from “AI that chats better.” What it’s saying is: you probably already have a pile of undervalued time-series data sitting around, hiding early-warning gold. You just haven’t dug into it yet.
Whether you work in healthcare, finance, security, or operations, SleepFM’s playbook is worth stealing. Not the model itself, but the framework: take ignored time-series signals and turn them into actionable risk predictions.
References
- Nature Medicine paper: https://www.nature.com/articles/s41591-025-04133-4
- The Batch Issue 341 (Andrew Ng commentary): https://www.deeplearning.ai/the-batch/issue-341/
- SleepFM-Clinical code: https://github.com/zou-group/sleepfm-clinical