ZeroGPU Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

akhaliq submitted a paper 1 day ago

UM-Text: A Unified Multimodal Model for Image Understanding

ybelkada authored a paper 6 days ago

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

akhaliq submitted a paper 7 days ago

ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation

View all activity

codelion

posted an update 20 days ago

Post

5985

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!

Key findings from our research on optimal architectures for small language models:

→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning

We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.

Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m

1 reply

ShoufaChen

authored a paper 21 days ago

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Paper • 2512.21338 • Published 22 days ago • 21

codelion

posted an update 27 days ago

Post

2394

Introducing PTS Visualizer - an interactive tool for exploring how language models reason!

Visualize pivotal tokens, thought anchors, and reasoning circuits. See which tokens and sentences significantly impact success probability, explore embedding clusters, and trace reasoning step-by-step.

Try it: codelion/pts-visualizer

Explore PTS datasets:
- Qwen3-0.6B: codelion/Qwen3-0.6B-pts
- DeepSeek-R1: codelion/DeepSeek-R1-Distill-Qwen-1.5B-pts

Or upload your own JSONL files!

GitHub: https://github.com/codelion/pts

victor

posted an update 28 days ago

Post

3085

Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models 🔥

https://huggingface.co/blog/nvidia/nemotron-3-nano-evaluation-recipe

IDKiro

authored a paper about 1 month ago

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 100

ozayezerceli

authored 2 papers about 1 month ago

TurkEmbed: Turkish Embedding Model on NLI & STS Tasks

Paper • 2511.08376 • Published Nov 11, 2025 • 2

TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task

Paper • 2511.07595 • Published Nov 10, 2025 • 1

codelion

posted an update about 1 month ago

Post

2588

Recently, Essential AI released a new 8B base model EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning -

"In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. "

This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training -
https://huggingface.co/blog/codelion/optimal-dataset-mixing

codelion

posted an update about 1 month ago

Post

2671

NotebookLM's infographics feature is amazing, it generates poster-type images from any text. Here is one I tried for my new HF article on ellora - https://huggingface.co/blog/codelion/ellora-lora-recipes

codelion

posted an update about 1 month ago

Post

2313

Perplexity released a dataset (BrowseSafe) and benchmark to catch and prevent malicious prompt-injection instructions in real-time.

We trained a prompt injection classifier on BrowseSafe using adaptive-classifier with ModernBERT-base embeddings.

74.9% F1 on detecting prompt injection in web content.

Model -> adaptive-classifier/browsesafe
Dataset -> perplexity-ai/browsesafe-bench
Repo -> https://github.com/codelion/adaptive-classifier

1 reply

ShoufaChen

authored 6 papers about 1 month ago

TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Paper • 2512.02014 • Published Dec 1, 2025 • 72

codelion

posted an update about 1 month ago

Post

1610

I just published Ellora - 6 production-ready LoRA recipes for enhancing LLMs with specific capabilities. Each recipe costs under $100 to run and includes complete training code, data generation, and evaluation.

The 6 Recipes:
Recipe 1: Accuracy Recovery - Recover 75% of quantization losses with self-distillation
Recipe 2: Reasoning LoRA - Add structured thinking with GRPO (0% to 60% adoption, 75% quality boost)
Recipe 3: Tool Calling - Real execution on actual codebases
Recipe 4: Context Extension - Scale from 32K to 2M tokens (61x increase)
Recipe 5: Secure Code Generation - 97% vulnerability reduction using automated Semgrep analysis
Recipe 6: Execution-Aware World Models - Teaching models runtime behavior

Why Recipes?
Ellora provides methodologies, not frameworks. Use them with your existing tools (PEFT, LoRAX, vLLM, Unsloth, HuggingFace). Each recipe uses self-supervised data generation (Magpie approach) - no expensive human labeling required.

All recipes include Jupyter notebooks you can run immediately with clear success metrics.

GitHub: https://github.com/codelion/ellora
Full Article: https://huggingface.co/blog/codelion/ellora-lora-recipes

Built something with these recipes? I'd love to see what you create!

ozayezerceli

authored 2 papers about 2 months ago

Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs

Paper • 2511.17220 • Published Nov 21, 2025 • 17

TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval

Paper • 2511.16528 • Published Nov 20, 2025 • 22

codelion

posted an update about 2 months ago

Post

1993

Introducing OpenEvolve Prompt Optimizer - a Space that automatically evolves and optimizes your prompts using OpenEvolve!

This tool uses OpenEvolve to iteratively improve prompts by testing them on real datasets and evolving better versions. No more manual prompt engineering guesswork - let OpenEvolve find the optimal prompts for you.

How it works:
- Enter your initial prompt using {input} as a placeholder for dataset inputs
- Input any HuggingFace dataset name you want to use for optimization
- Specify the dataset split and field names for your use case
- Click Optimize Prompt and the system will validate everything first
- Compare your initial prompt vs the evolved best prompt side-by-side

Try it here: algorithmicsuperintelligence/prompt-optimizer

OpenEvolve GitHub: https://github.com/algorithmicsuperintelligence/openevolve

AI & ML interests

Recent Activity

Team members 751

zero-gpu-explorers's activity