temp-org (temp-org)

posted an update 2 days ago

Post

189

Quietly baking Image → Music 🎵 v3 — now running on SOTA open-source models.
👉 fffiloni/image-2-music-v3 | Feel free to test it and share feedback.

Just wiring together: merve/moondream3 * victor/ace-step-jam

Image → prompt → audio | Early version, will evolve | Follow: @fffiloni

fffiloni

posted an update 3 days ago

Post

1752

🚀 RB-Modulation is back on Hugging Face Spaces!

This is an older project that recently broke due to dependency changes, but it’s now fixed and running again ✅

👉 What’s fixed:
- GroundingDINO & LangSAM installation
- compatibility with recent environments
- GPU inference running smoothly again

👉 Try it here:
fffiloni/RB-Modulation

Feel free to give it a try again — feedback welcome!

victor

posted an update 12 days ago

Post

4989

Want to share my enthusiasm for zai-org/GLM-5.1 here too 🔥

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀

4 replies

·

fffiloni

posted an update 15 days ago

Post

3151

✨ PASD Magnify is back on Hugging Face Spaces

fffiloni/PASD

PASD isn’t recent, but still delivers strong results — worth restoring rather than replacing.

Getting it to run again wasn’t a simple dependency issue.
It relied on parts of diffusers that no longer exist, while moving to Gradio 6 forced a much newer HF stack — and I couldn’t modify the original source directly.

Recreating the old environment wasn’t practical.
So I patched the downloaded code at runtime before import and made it compatible with today’s stack.

That ended up being the only approach that held without forking or freezing everything to outdated versions.

If you’ve used it before (or are curious), feel free to give it another try.

fffiloni

posted an update 24 days ago

Post

2856

✅ Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 🚀

It lets you:
- 🎙️ Separate multiple speakers from an audio file
- 🎬 Extract each speaker directly from a video
- 🎧 Split audio into dialog, music, and sound effects (DnR)
- 🎥 Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here 👉 fffiloni/TIGER-audio-extraction

fffiloni

posted an update 25 days ago

Post

2263

AniDoc is back 🎉

I’ve fixed the Space and brought it back to life:
- ✅ Working again after being broken for a while
- ✅ Updated to Gradio 6
- ✅ Compatible with ZeroGPU
- ✅ Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc

fffiloni

posted an update about 1 month ago

Post

4130

I brought DALL·E mini back to life 🤖🎨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model — still weird, still fun 😄

4 replies

·

fffiloni

posted an update about 1 month ago

Post

496

A clearer demo for TADA (now multilingual) 🔊🌍

I improved the public demo for TADA — a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

• load the model
• prepare a reference voice (optionally with transcript or Whisper auto-transcription)
• generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

👉 fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)

victor

submitted a paper to Daily Papers about 2 months ago

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

Paper • 2602.21548 • Published Feb 25 • 50

victor

posted an update 3 months ago

Post

2737

Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it 👀

https://huggingface.co/blog/upskill

victor

posted an update 4 months ago

Post

3501

Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models 🔥

https://huggingface.co/blog/nvidia/nemotron-3-nano-evaluation-recipe

multimodalart

posted an update 6 months ago

Post

25697

Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt

1 reply

·

multimodalart

posted an update 10 months ago

Post

18339

Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing

6 replies

·

victor

posted an update 11 months ago

Post

7635

Open Source Avengers, Assemble! Ask an expert AI agent team to solve complex problems together 🔥

Consilium brings together multiple agents that debate and use live research (web, arXiv, SEC) to reach a consensus. You set the strategy, they find the answer.

Credit to @azettl for this awesome demo: Agents-MCP-Hackathon/consilium_mcp

2 replies

·

victor

posted an update about 1 year ago

Post

5187

DIA TTS is just amazing - please share your funniest gens (here is mine) 😂
nari-labs/Dia-1.6B

1 reply

·

fffiloni

posted an update about 1 year ago

Post

28162

I was thinking i need to step up my game on training Flux LoRas models, time to have some fun ! ☀️

Expect a new drop per week on aesthetics that catched my attention, here are 3 of them that worked really well !

fffiloni/cute-comic-800
fffiloni/carbo-800
fffiloni/oniric-750

3 replies

·

victor

posted an update about 1 year ago

Post

6545

Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to do—like "make a viral meme" or "generate music"—and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

We’d love to hear what you think—drop us some feedback plz!

6 replies

·

fffiloni

posted an update about 1 year ago

Post

3606

Explain like i'm 5 the last take from @thomwolf on X about Dario's essay on DeepSeek:

—› Open-source AI is like a big cookbook that everyone can read and improve. Instead of a few chefs keeping their recipes secret, anyone can cook, test, and invent new things.

If only one company controls AI, everything stops if they have a problem—like when the internet goes down. With open-source, many people can help, making sure it keeps running smoothly.

AI isn’t just a race between two countries; it’s a team effort around the world. By sharing, we move faster and create safer technology for everyone.
—
🤗

victor

posted an update about 1 year ago

Post

3840

Finally, an open-source AI that turns your lyrics into full songs is here—meet YuE! Unlike other tools that only create short clips, YuE can make entire songs (up to 5 minutes) with vocals, melody, and instruments all working together. Letsss go!

m-a-p/YuE-s1-7B-anneal-en-cot

victor

posted an update over 1 year ago

Post

2297

Qwen/QwQ-32B-Preview shows us the future (and it's going to be exciting)...

I tested it against some really challenging reasoning prompts and the results are amazing 🤯.

Check this dataset for the results: https://huggingface.co/datasets/victor/qwq-misguided-attention

2 replies

·

temp-org

AI & ML interests

Recent Activity

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference

AI & ML interests

Recent Activity

Team members 4

temp-org's activity