gg-hf

Team

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

akhaliq submitted a paper 3 days ago

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

akhaliq submitted a paper 4 days ago

Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion

akhaliq submitted a paper 12 days ago

optimize_anything: A Universal API for Optimizing any Text Parameter

View all activity

akhaliq

submitted a paper to Daily Papers 3 days ago

DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation

Paper • 2605.30350 • Published 5 days ago • 8

akhaliq

submitted a paper to Daily Papers 4 days ago

Contrastive Distribution Matching for Amortized Sequential Monte Carlo in Discrete Diffusion

Paper • 2605.23346 • Published 11 days ago

alvarobartt

posted an update 11 days ago

Post

282

Open agents on AWS SageMaker AI with open models from the Hugging Face Hub!

> Deploy an open model from the Hugging Face Hub on SageMaker AI
> Connect the deployed model to Strands Agents
> Add built-in and custom tools for tool calling
> Expose external capabilities through MCP integration
> Bonus: talk to your agent and visualize traces with Gradio

https://alvarobartt.com/agents-on-aws-sagemaker

akhaliq

submitted a paper to Daily Papers 12 days ago

optimize_anything: A Universal API for Optimizing any Text Parameter

Paper • 2605.19633 • Published 14 days ago • 6

alvarobartt

posted an update 14 days ago

Post

3273

Latest hf-mem release added a breakdown of Mixture-of-Experts (MoE) memory usage!

TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint.

🧠 hf-mem now splits MoE memory into base model weights, routed experts, and KV cache
🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them
⚡ Active params isn't the same as memory footprint, especially for sparse architectures
📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident
📚 KV cache can still dominate depending on context length, batch size, and concurrency
🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate
🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving

Check the repository at https://github.com/alvarobartt/hf-mem

akhaliq

submitted a paper to Daily Papers about 1 month ago

Image Generators are Generalist Vision Learners

Paper • 2604.20329 • Published Apr 22 • 20

akhaliq

submitted a paper to Daily Papers about 2 months ago

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

Paper • 2603.06679 • Published Mar 30 • 6

akhaliq

submitted a paper to Daily Papers 2 months ago

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Paper • 2603.24517 • Published Mar 25 • 11

mishig

posted an update 2 months ago

Post

835

I like these models nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16 and nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8 and TradingAgents: Multi-Agents LLM Financial Trading Framework (2412.20138) and https://arxiv.org/abs/2412.20138

mlabonne/FineTome-100k

akhaliq

submitted 2 papers to Daily Papers 3 months ago

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

Paper • 2603.16792 • Published Mar 17 • 3

Multimodal OCR: Parse Anything from Documents

Paper • 2603.13032 • Published Mar 13 • 44

clefourrier

authored a paper 3 months ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published Mar 12 • 65

alvarobartt

posted an update 3 months ago

Post

3742

Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr

akhaliq

submitted 3 papers to Daily Papers 4 months ago

posted an update 4 months ago

Post

3271

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.