🏗️ Building on HF

PhysiQuanty PRO

PhysiQuanty

7 25 257

AI & ML interests

Theoretical Physics, Invariant Tokenization, Standard Model of Particle Physics Applied ML 🇫🇷

Recent Activity

upvoted an article about 3 hours ago

🔁 Teaching a 15M French LLM to think deeper — and to know when to stop 🇫🇷

reacted to RDTvlokip's post with 🚀 about 3 hours ago

I finally changed the architecture of my 15M French LLM. It worked. Then I almost fooled myself about how much and catching that was the real win. After proving last time that architecture is a threshold, not a lever, I got stubborn: could I change how the model learns? Four honest attempts, Lion, a sharper AdamW β2, multi-token prediction, LayerScale. Four failures. The bottleneck wasn't the learning rule either. So I changed the shape of the computation instead: loop the same transformer blocks 4×, deeper reasoning, zero added parameters. It beat the baseline on perplexity, the first thing in the whole project to move that number. Then I added my own twist: let each token decide how deep to think, halting on its own entropy. My first evaluation was spectacular. Coherence up 65%. Hallucinated names down 62%. It was noise. Eight prompts, one seed. I re-ran on 50 prompts × 200 tokens and watched the gains shrink to "modest" and on out-of-domain prompts, recurrence actually made things worse. No universal winner. And none of it is new: it's Adaptive Computation Time (2016), the Universal Transformer (2018), and LoopViT (2026), recombined and measured honestly. The real lesson: A number from 8 prompts is a rumor. The eval harness that kills your own best result is worth more than the result it kills. Cite your lineage. Stay preliminary until multiple seeds say otherwise. The three models are live. The write-up is honest about every caveat 👇 🔗 https://huggingface.co/blog/RDTvlokip/teaching-a-15m-french-llm-to-think-deeper

reacted to RDTvlokip's post with 🔥 about 3 hours ago

View all activity

Organizations

upvoted an article about 3 hours ago

Article

🔁 Teaching a 15M French LLM to think deeper — and to know when to stop 🇫🇷

RDTvlokip

•

2 days ago

• 3

reacted to RDTvlokip's post with 🚀🔥👍 about 3 hours ago

Post

1869

I finally changed the architecture of my 15M French LLM. It worked. Then I almost fooled myself about how much and catching that was the real win.

After proving last time that architecture is a threshold, not a lever, I got stubborn: could I change how the model learns? Four honest attempts, Lion, a sharper AdamW β2, multi-token prediction, LayerScale. Four failures. The bottleneck wasn't the learning rule either.

So I changed the shape of the computation instead: loop the same transformer blocks 4×, deeper reasoning, zero added parameters. It beat the baseline on perplexity, the first thing in the whole project to move that number. Then I added my own twist: let each token decide how deep to think, halting on its own entropy.

My first evaluation was spectacular. Coherence up 65%. Hallucinated names down 62%.

It was noise.

Eight prompts, one seed. I re-ran on 50 prompts × 200 tokens and watched the gains shrink to "modest" and on out-of-domain prompts, recurrence actually made things worse. No universal winner. And none of it is new: it's Adaptive Computation Time (2016), the Universal Transformer (2018), and LoopViT (2026), recombined and measured honestly.

The real lesson:

A number from 8 prompts is a rumor. The eval harness that kills your own best result is worth more than the result it kills. Cite your lineage. Stay preliminary until multiple seeds say otherwise.

The three models are live. The write-up is honest about every caveat 👇

🔗 https://huggingface.co/blog/RDTvlokip/teaching-a-15m-french-llm-to-think-deeper

New activity in RDTvlokip/Cadence-15M-fr about 3 hours ago

merci

#1 opened about 3 hours ago by

PhysiQuanty

liked a model about 3 hours ago

RDTvlokip/Cadence-15M-fr

Text Generation • Updated 2 days ago • 2

upvoted an article about 3 hours ago

Article

🔁 Apprendre à un LLM français de 15M à penser plus profond — et à savoir quand s'arrêter 🇫🇷

RDTvlokip

•

2 days ago

• 2

liked a model about 15 hours ago

Shrijanagain/H-GEMMA4-SFT-INSTRUCT

5B • Updated May 15 • 156 • 6

liked a dataset about 15 hours ago

Shrijanagain/DISTILLATION-VIBE-THINKER

Viewer • Updated 1 day ago • 29.9k • 42 • 1

reacted to Shrijanagain's post with 👀 about 15 hours ago

Post

114

Welcome Researcher and Developers!

SKT AI Labs, we are pushing the boundaries of AI architecture and research—and today, we are thrilled to open our doors to the global research community!

We warmly welcome researchers, developers, and AI enthusiasts to join us and contribute to our R&D efforts.

🧪 What You Can Explore:

We invite you to experiment with our WMF (Weight Manifold Fusion) technology. You can test this high-dimensional fusion technique on smaller models to gain a deeper understanding of its behavior and token convergence.

---------- CHECK OUT:

SPACE : SKT-NRS/RD
EXPERIMENT : sKT-Ai-Labs/SKT-SURYA-H
DIRECT TO MAIN DISCUSSION : SKT-NRS/RD#1

🤝 Your Feedback Shapes the Future :

If it works: Fantastic! Share your results with us and contribute directly to the core vision of SKT AI Labs.

If it doesn't work: No problem at all! Your critical feedback is just as valuable to us. Every experiment and anomaly helps us refine this architecture to make it more stable and robust.

We firmly believe that true innovation stems from community collaboration and transparent testing. Let's build the future of advanced AI together. Your ideas, test results, and feedback are always welcome!

You Can Still Research and Development On WMF Only SKT-SURYA-H Model is Dismissed.

Let's innovate and build together! 💡