# Hardware Selection Guide Choosing the right hardware (flavor) is critical for cost-effective workloads. ## Available Hardware ### CPU - `cpu-basic` - Basic CPU, testing only - `cpu-upgrade` - Enhanced CPU **Use cases:** Data processing, testing scripts, lightweight workloads **Not recommended for:** Model training, GPU-accelerated workloads ### GPU Options | Flavor | GPU | Memory | Use Case | Cost/hour | |--------|-----|--------|----------|-----------| | `t4-small` | NVIDIA T4 | 16GB | <1B models, demos, batch inference | ~$0.50-1 | | `t4-medium` | NVIDIA T4 | 16GB | 1-3B models, development | ~$1-2 | | `l4x1` | NVIDIA L4 | 24GB | 3-7B models, efficient workloads | ~$2-3 | | `l4x4` | 4x NVIDIA L4 | 96GB | Multi-GPU workloads | ~$8-12 | | `a10g-small` | NVIDIA A10G | 24GB | 3-7B models, production | ~$3-4 | | `a10g-large` | NVIDIA A10G | 24GB | 7-13B models | ~$4-6 | | `a10g-largex2` | 2x NVIDIA A10G | 48GB | Multi-GPU, large models | ~$8-12 | | `a10g-largex4` | 4x NVIDIA A10G | 96GB | Multi-GPU, very large models | ~$16-24 | | `a100-large` | NVIDIA A100 | 40GB | 13B+ models, fast workloads | ~$8-12 | ## Selection Guidelines ### By Workload Type **Data Processing** - **Recommended:** `cpu-upgrade` or `l4x1` - **Use case:** Transform, filter, analyze datasets - **Batch size:** Depends on data size - **Time:** Varies by dataset size **Batch Inference** - **Recommended:** `a10g-large` or `a100-large` - **Use case:** Run inference on thousands of samples - **Batch size:** 8-32 depending on model - **Time:** Depends on number of samples **Experiments & Benchmarks** - **Recommended:** `a10g-small` or `a10g-large` - **Use case:** Reproducible ML experiments - **Batch size:** Varies - **Time:** Depends on experiment complexity **Model Training** (see `model-trainer` skill for details) - **Recommended:** See model-trainer skill - **Use case:** Fine-tuning models - **Batch size:** Depends on model size - **Time:** Hours to days **Synthetic Data Generation** - **Recommended:** `a10g-large` or `a100-large` - **Use case:** Generate datasets using LLMs - **Batch size:** Depends on generation method - **Time:** Hours for large datasets ### By Budget **Minimal Budget (<$5 total)** - Use `cpu-basic` or `t4-small` - Process small datasets - Quick tests and demos **Small Budget ($5-20)** - Use `t4-medium` or `a10g-small` - Process medium datasets - Run experiments **Medium Budget ($20-50)** - Use `a10g-small` or `a10g-large` - Process large datasets - Production workloads **Large Budget ($50-200)** - Use `a10g-large` or `a100-large` - Large-scale processing - Multiple experiments ### By Model Size (for inference/processing) **Tiny Models (<1B parameters)** - **Recommended:** `t4-small` - **Example:** Qwen2.5-0.5B, TinyLlama - **Batch size:** 8-16 **Small Models (1-3B parameters)** - **Recommended:** `t4-medium` or `a10g-small` - **Example:** Qwen2.5-1.5B, Phi-2 - **Batch size:** 4-8 **Medium Models (3-7B parameters)** - **Recommended:** `a10g-small` or `a10g-large` - **Example:** Qwen2.5-7B, Mistral-7B - **Batch size:** 2-4 **Large Models (7-13B parameters)** - **Recommended:** `a10g-large` or `a100-large` - **Example:** Llama-3-8B - **Batch size:** 1-2 **Very Large Models (13B+ parameters)** - **Recommended:** `a100-large` - **Example:** Llama-3-13B, Llama-3-70B - **Batch size:** 1 ## Memory Considerations ### Estimating Memory Requirements **For inference:** ``` Memory (GB) ≈ (Model params in billions) × 2-4 ``` **For training:** ``` Memory (GB) ≈ (Model params in billions) × 20 (full) or × 4 (LoRA) ``` **Examples:** - Qwen2.5-0.5B inference: ~1-2GB ✅ fits t4-small - Qwen2.5-7B inference: ~14-28GB ✅ fits a10g-large - Qwen2.5-7B training: ~140GB ❌ not feasible without LoRA ### Memory Optimization If hitting memory limits: 1. **Reduce batch size** ```python batch_size = 1 ``` 2. **Process in chunks** ```python for chunk in chunks: process(chunk) ``` 3. **Use smaller models** - Use quantized models - Use LoRA adapters 4. **Upgrade hardware** - cpu → t4 → a10g → a100 ## Cost Estimation ### Formula ``` Total Cost = (Hours of runtime) × (Cost per hour) ``` ### Example Calculations **Data processing:** - Hardware: cpu-upgrade ($0.50/hour) - Time: 1 hour - Cost: $0.50 **Batch inference:** - Hardware: a10g-large ($5/hour) - Time: 2 hours - Cost: $10.00 **Experiments:** - Hardware: a10g-small ($3.50/hour) - Time: 4 hours - Cost: $14.00 ### Cost Optimization Tips 1. **Start small:** Test on cpu-basic or t4-small 2. **Monitor runtime:** Set appropriate timeouts 3. **Optimize code:** Reduce unnecessary compute 4. **Choose right hardware:** Don't over-provision 5. **Use checkpoints:** Resume if job fails 6. **Monitor costs:** Check running jobs regularly ## Multi-GPU Workloads Multi-GPU flavors automatically distribute workloads: **Multi-GPU flavors:** - `l4x4` - 4x L4 GPUs - `a10g-largex2` - 2x A10G GPUs - `a10g-largex4` - 4x A10G GPUs **When to use:** - Large models (>13B parameters) - Need faster processing (linear speedup) - Large datasets (>100K samples) - Parallel workloads **Example:** ```python hf_jobs("uv", { "script": "process.py", "flavor": "a10g-largex2", # 2 GPUs "timeout": "4h", "secrets": {"HF_TOKEN": "$HF_TOKEN"} }) ``` ## Choosing Between Options ### CPU vs GPU **Choose CPU when:** - No GPU acceleration needed - Data processing only - Budget constrained - Simple workloads **Choose GPU when:** - Model inference/training - GPU-accelerated libraries - Need faster processing - Large models ### a10g vs a100 **Choose a10g when:** - Model <13B parameters - Budget conscious - Processing time not critical **Choose a100 when:** - Model 13B+ parameters - Need fastest processing - Memory requirements high - Budget allows ### Single vs Multi-GPU **Choose single GPU when:** - Model <7B parameters - Budget constrained - Simpler debugging **Choose multi-GPU when:** - Model >13B parameters - Need faster processing - Large batch sizes required - Cost-effective for large jobs ## Quick Reference ```python # Workload type → Hardware selection HARDWARE_MAP = { "data_processing": "cpu-upgrade", "batch_inference_small": "t4-small", "batch_inference_medium": "a10g-large", "batch_inference_large": "a100-large", "experiments": "a10g-small", "training": "see model-trainer skill" } ```