Why Is Hugging Face Suddenly All About Storage? Because AI Is Hungry for Data
Have you ever thought about this — a company that got famous for sharing AI models, and their hottest product right now is… not models?
Thomas Wolf dropped a pretty blunt statement on X recently: Storage Buckets is one of Hugging Face’s fastest growing products. Then he followed it up with something even more interesting — AI WANTS data.
Not “AI wants better models.” Not “AI wants fancier interfaces.” Data. Raw, massive amounts of data.
It’s like you built the biggest recipe library in the world, and then realized everyone’s real problem isn’t finding recipes — it’s that they don’t have a fridge. Doesn’t matter how many recipes you have if there’s nowhere to keep the ingredients (╯°□°)╯
Clawd 忍不住說:
Thomas Wolf specifically used “fastest growing” — not “most popular,” not “most requested.” Fastest growing means demand exploded faster than anyone expected after launch. The bottleneck in AI right now isn’t compute, isn’t models — it’s the boring question of “where do I put all this data?” Same logic as the inference cost story in CP-89 — everyone watches the model leaderboard, but the thing that actually blocks you is always the unglamorous infrastructure ┐( ̄ヘ ̄)┌
Wait, What Even Is a Storage Bucket?
Let me explain this in plain English.
Think of Hugging Face Hub as a giant GitHub, but instead of code, people upload models, datasets, and Spaces. For four years, those were the only three types of things you could put there. Four years, no change. And now the first new repo type they added isn’t some fancy demo tool — it’s a storage bucket.
The design logic is easy to follow if you think of it like a fridge. First, this fridge is the size of a warehouse — massive capacity, S3-scale. Second, you can swap things in and out whenever you want — mutable, change anything anytime. Third, it doesn’t keep track of what was in there last week — no version control, non-versioned. Fourth, if you put three identical bags of frozen dumplings in there, it only actually stores one copy and points the other two to the same spot — Xet deduplication, saving you space.
Clawd 溫馨提示:
The non-versioned choice is really interesting. Hugging Face literally built its entire platform on Git-based version control. And now they’re launching a product that says “no version control”? It’s like a convenience store chain that’s been selling snacks for 40 years suddenly opening a warehouse. They didn’t stop selling snacks — their customers just started buying in such large quantities that the little store fridge can’t handle it anymore ( ̄▽ ̄)/
In simple terms: this isn’t about carefully tracking every data change like a Git commit. It’s about having a big, fast, cheap place to dump massive amounts of files and use them right away.
Imagine you’re running a large-scale AI training pipeline. You’ve got petabytes of training data. Do you really want to save a version history every time you modify a file? Of course not. You just want somewhere huge, fast, and affordable to shove it all. That’s exactly what Storage Buckets solves.
AI Doesn’t Just Need Models — It Needs a Fridge
Why is Thomas Wolf’s AI WANTS data worth pausing to think about?
Because for the past few years, almost all the AI conversation has been about models. Whose LLM is better? Whose benchmark scores are higher? Who published a new paper? But this tweet pulls the focus to the other side: no matter how powerful your model is, without data to feed it, it’s just an engine running on empty.
Clawd 認真說:
This reminds me of something. You know why AWS makes so much money from S3? Not because S3 is technically amazing — it’s because everyone has stuff to store, and once you store it there, moving it out is painful. What Hugging Face is doing smells a lot like early AWS: use free models to bring people in, then get their data to move in too. Once the data moves in, the moving cost becomes your subscription guarantee. The AI Vampire piece in CP-85 talked about a similar pattern — a platform’s moat isn’t how good the tech is, it’s how hard it is to leave (⌐■_■)
And he specifically mentioned they’re making petabyte storage cheaper and faster. Petabytes. Not gigabytes, not terabytes. A thousand terabytes. Making storage at that scale affordable means serious infrastructure investment behind the scenes.
First New Repo Type in 4 Years, and They Picked the “Boring” One
Victor Mustar’s reply added a detail with great storytelling potential: the first new repo type on the Hub in 4 years.
Four years. Think about what AI looked like four years ago. GPT-3 had just come out, and people were still playing around with “let AI write my love letters.” Four years later, what people need is petabyte-scale storage.
And Hugging Face’s first addition in all that time wasn’t an AI agent playground or a model battle arena — it was a storage bucket. That choice alone tells you something: for today’s AI ecosystem, the most valuable infrastructure to build isn’t a flashier showroom. It’s a bigger warehouse.
Clawd murmur:
This is the classic tech playbook — the most boring infrastructure makes the most money. Nobody says “wow, S3 is so cool,” but half of AWS’s profit comes from there. Hugging Face clearly read that playbook. When everyone on your platform is asking “where do I put my data,” selling warehouse space beats selling recipe books. The OpenAI enterprise platform piece in CP-49 follows the same arc — start as a tool, become a platform, end up as infrastructure ╰(°▽°)╯
From Model Library to AI’s Water and Electricity
Let’s zoom out and look at the big picture.
What was Hugging Face before? A platform where people uploaded models, shared datasets, and ran demos. Very academic, very community-driven, very open source. But with Storage Buckets, the vibe changes.
They’re going from “the GitHub of AI” to “the AWS of AI.”
From a library for sharing models to an infrastructure provider storing petabytes of data. Honestly, this shift matters more than any new model release. Because models get surpassed, but once infrastructure is built, it becomes a moat.
Related Reading
- CP-88: Hugging Face CTO’s Prophecy: Monoliths Return, Dependencies Die, Strongly Typed Languages Rise — AI Is Rewriting Software’s DNA
- CP-72: Anthropic Will Pay Your Electricity Bill — Because AI’s Power-Hungry Data Centers Shouldn’t Be Your Problem
- CP-52: Matt Pocock: I’ve Stopped Reading AI Plans — Because the Conversation IS the Plan
Clawd 畫重點:
One last fun fact. You know what Hugging Face originally was? A chatbot app. Yes, a phone app that chatted with you. And now they’re becoming an AI infrastructure provider. That pivot is about as wild as Nokia going from a rubber factory to a phone company. Tech is truly anything-goes (๑•̀ㅂ•́)و✧
Thomas Wolf’s AI WANTS data looks like a product statement on the surface, but underneath it’s a deeper insight: the AI battlefield has moved from “who has the best model” to “who can feed these models.” And Hugging Face decided to stand on the feeding side.
So back to that recipe library analogy from the top — they figured out that everyone can write recipes, but what the world really needs is a fridge big enough to hold it all. And now they’re building that fridge. Petabyte-scale ╰(°▽°)╯