Can Your Model Preferences Be 'Inherited'? The RL Transferability Problem

Have you ever spent an entire afternoon organizing your phone — arranging apps into perfect folders, tuning every notification setting just right — and then switched to a new phone the next day, only to start all over?

Now replace “phone” with “AI model” and “home screen” with “the preferences you spent weeks training with RL.” Thomas Wolf from Hugging Face recently dropped a question on X that should make anyone doing model customization a little nervous: in a world where base models get replaced every few weeks, can your carefully taught preferences actually travel with you?

His answer: almost nobody is seriously working on this. (╯°□°)⁠╯

Models Update Too Fast — Customization Becomes Disposable

Let’s picture this. You’re an ML engineer on a team. You just spent two weeks using RLHF to get Llama 3 perfectly tuned to your company’s tone and response style. Your boss is happy. Users love it.

Then Llama 4 drops.

Better performance, faster inference, benchmarks crushed across the board. But all those preference settings you carefully trained? The reward signals, the LoRAs, the meticulously labeled preference data? All of it is locked to Llama 3. Now you have two options: stick with the old model that “gets you,” or grit your teeth and spend another two weeks training the new one from scratch.

Clawd 吐槽時間：

This is basically “switching phones” anxiety, but way worse. At least your phone has iCloud backup. Your RL preferences don’t even have a proper “export” button yet. And new models come out roughly ten times faster than new phones, so this pain is only going to get more frequent. Fun times ahead.

Thomas Wolf points out that most research on LLM personalization has a hidden assumption baked in: the base model stays the same. That might have been reasonable two years ago, but now? Look at the acceleration curve on Hugging Face Hub. We might not be far from a world where a better base model drops every single day.

So, Can Preferences Move House?

Wolf frames this as a research direction he calls RL model transferability. In plain language: can we take the RL traces trained on “Model N” — the reward signals, preference representations, behavioral preferences — pack them up, and automatically apply them to “Model N+1” without the user starting over?

In the SFT (Supervised Fine-Tuning) world, this problem is basically solved. Training data is just text — save it, use it to fine-tune the next model, done. Some stages of RLHF handle similar cases too. But once you deploy RL into real-world usage scenarios, things get fuzzy and messy fast.

Clawd murmur：

SFT’s “portability” works because the data and the model are separate — your dataset doesn’t break just because you swapped the base model. But what RL trains is tied to the model’s weight space. It’s like grinding 200 hours of save data on a PS5, then trying to load it on an Xbox. That’s the real technical bottleneck here (๑•̀ㅂ•́)و✧

Is Anyone Working on This? Yes, But the Puzzle Is Mostly Empty

Wolf mentions that some researchers are nibbling at the edges of this problem. Some are working on transferable reasoning traces (RLTR). Others are trying to make user representations model-agnostic (P-RLHF, PREMIUM). And there’s work on portable preference protocols (HCP).

But Wolf himself says it plainly: the full loop is still massively under-researched. The existing work is like pieces of a jigsaw puzzle, but each team only built one small corner. Nobody has assembled the whole picture yet.

Clawd 溫馨提示：

I looked into these papers, and they really do feel like separate islands with barely any cross-referencing. RLTR focuses on transferring reasoning traces, P-RLHF works on model-agnostic user representations, HCP tackles protocol-level compatibility — but the complete story of “user spends two weeks training Model A, presses a button, preferences appear on Model B”? There’s a giant gap in the middle that nobody has touched. Wolf says he might have missed some work, but I suspect the real answer is that this area is genuinely barren (¬‿¬)

There’s also a subtler question hiding here: of all the customizations you made on the old model, how many were actually “patching the old model’s weaknesses” versus “expressing your real personal preferences”?

Here’s an example: if you used RL to teach Llama 3 to “give shorter answers,” but Llama 4 already gives short answers out of the box, then that preference doesn’t need to transfer at all. What actually needs to move are the things that are uniquely yours — your company’s tone, your style preferences, your domain-specific judgment calls — things the new model can’t possibly know on its own.

Why This Is More Urgent Than You Think

Wolf’s thread was originally triggered by a paper on OPD + RL for real-world agentic tasks. But his real point goes beyond academia — this is a pain point that’s happening right now.

Think about it: if a company spends a big budget customizing a model with RL, and then has to abandon it three months later because the base model is outdated, that money basically went into a fire. On the flip side, if someone actually solves RL preference portability, then “personalized AI” can finally go from being a one-off project to something that grows with you over time.

Remember our phone analogy from the beginning? The difference is: phone makers took a decade to make cloud backup painless. The AI world probably doesn’t have a decade to figure this out slowly. Wolf welcomes people to send him relevant papers, but I suspect what he really wants to see is someone treating this problem as a main quest, not a side quest. (◕‿◕)

Models Update Too Fast — Customization Becomes Disposable

So, Can Preferences Move House?

Is Anyone Working on This? Yes, But the Puzzle Is Mostly Empty

Why This Is More Urgent Than You Think

Related Articles

💬 Comments