# Token Usage Guide for Hugging Face Jobs **⚠️ CRITICAL:** Proper token usage is essential for any job that interacts with the Hugging Face Hub. ## Overview Hugging Face tokens are authentication credentials that allow your jobs to interact with the Hub. They're required for: - Pushing models/datasets to Hub - Accessing private repositories - Creating new repositories - Using Hub APIs programmatically - Any authenticated Hub operations ## Token Types ### Read Token - **Permissions:** Download models/datasets, read private repos - **Use case:** Jobs that only need to download/read content - **Creation:** https://huggingface.co/settings/tokens ### Write Token - **Permissions:** Push models/datasets, create repos, modify content - **Use case:** Jobs that need to upload results (most common) - **Creation:** https://huggingface.co/settings/tokens - **⚠️ Required for:** Pushing models, datasets, or any uploads ### Organization Token - **Permissions:** Act on behalf of an organization - **Use case:** Jobs running under organization namespace - **Creation:** Organization settings → Tokens ## Providing Tokens to Jobs ### Method 1: Automatic Token (Recommended) ⭐ ```python hf_jobs("uv", { "script": "your_script.py", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Automatic replacement }) ``` **How it works:** 1. `$HF_TOKEN` is a placeholder that gets replaced with your actual token 2. Uses the token from your logged-in session (`hf auth login`) 3. Token is encrypted server-side when passed as a secret 4. Most secure and convenient method **Benefits:** - ✅ No token exposure in code - ✅ Uses your current login session - ✅ Automatically updated if you re-login - ✅ Works seamlessly with MCP tools - ✅ Token encrypted server-side **Requirements:** - Must be logged in: `hf auth login` or `hf_whoami()` works - Token must have required permissions ### Method 2: Explicit Token (Not Recommended) ```python hf_jobs("uv", { "script": "your_script.py", "secrets": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Hardcoded token }) ``` **When to use:** - Only if automatic token doesn't work - Testing with a specific token - Organization tokens (use with caution) **Security concerns:** - ❌ Token visible in code/logs - ❌ Must manually update if token rotates - ❌ Risk of token exposure - ❌ Not recommended for production ### Method 3: Environment Variable (Less Secure) ```python hf_jobs("uv", { "script": "your_script.py", "env": {"HF_TOKEN": "hf_abc123..."} # ⚠️ Less secure than secrets }) ``` **Difference from secrets:** - `env` variables are visible in job logs - `secrets` are encrypted server-side - Always prefer `secrets` for tokens **When to use:** - Only for non-sensitive configuration - Never use for tokens (use `secrets` instead) ## Using Tokens in Scripts ### Accessing Tokens Tokens passed via `secrets` are available as environment variables in your script: ```python import os # Get token from environment token = os.environ.get("HF_TOKEN") # Verify token exists if not token: raise ValueError("HF_TOKEN not found in environment!") ``` ### Using with Hugging Face Hub **Option 1: Explicit token parameter** ```python from huggingface_hub import HfApi api = HfApi(token=os.environ.get("HF_TOKEN")) api.upload_file(...) ``` **Option 2: Auto-detection (Recommended)** ```python from huggingface_hub import HfApi # Automatically uses HF_TOKEN env var api = HfApi() # ✅ Simpler, uses token from environment api.upload_file(...) ``` **Option 3: With transformers/datasets** ```python from transformers import AutoModel from datasets import load_dataset # Auto-detects HF_TOKEN from environment model = AutoModel.from_pretrained("username/model") dataset = load_dataset("username/dataset") # For push operations, token is auto-detected model.push_to_hub("username/new-model") dataset.push_to_hub("username/new-dataset") ``` ### Complete Example ```python # /// script # dependencies = ["huggingface-hub", "datasets"] # /// import os from huggingface_hub import HfApi from datasets import Dataset # Verify token is available assert "HF_TOKEN" in os.environ, "HF_TOKEN required for Hub operations!" # Use token for Hub operations api = HfApi() # Auto-detects HF_TOKEN # Create and push dataset data = {"text": ["Hello", "World"]} dataset = Dataset.from_dict(data) # Push to Hub (token auto-detected) dataset.push_to_hub("username/my-dataset") print("✅ Dataset pushed successfully!") ``` ## Token Verification ### Check Authentication Locally ```python from huggingface_hub import whoami try: user_info = whoami() print(f"✅ Logged in as: {user_info['name']}") except Exception as e: print(f"❌ Not authenticated: {e}") ``` ### Verify Token in Job ```python import os # Check token exists if "HF_TOKEN" not in os.environ: raise ValueError("HF_TOKEN not found in environment!") token = os.environ["HF_TOKEN"] # Verify token format (should start with "hf_") if not token.startswith("hf_"): raise ValueError(f"Invalid token format: {token[:10]}...") # Test token works from huggingface_hub import whoami try: user_info = whoami(token=token) print(f"✅ Token valid for user: {user_info['name']}") except Exception as e: raise ValueError(f"Token validation failed: {e}") ``` ## Common Token Issues ### Error: 401 Unauthorized **Symptoms:** ``` 401 Client Error: Unauthorized for url: https://huggingface.co/api/... ``` **Causes:** 1. Token missing from job 2. Token invalid or expired 3. Token not passed correctly **Solutions:** 1. Add `secrets={"HF_TOKEN": "$HF_TOKEN"}` to job config 2. Verify `hf_whoami()` works locally 3. Re-login: `hf auth login` 4. Check token hasn't expired **Verification:** ```python # In your script import os assert "HF_TOKEN" in os.environ, "HF_TOKEN missing!" ``` ### Error: 403 Forbidden **Symptoms:** ``` 403 Client Error: Forbidden for url: https://huggingface.co/api/... ``` **Causes:** 1. Token lacks required permissions (read-only token used for write) 2. No access to private repository 3. Organization permissions insufficient **Solutions:** 1. Ensure token has write permissions 2. Check token type at https://huggingface.co/settings/tokens 3. Verify access to target repository 4. Use organization token if needed **Check token permissions:** ```python from huggingface_hub import whoami user_info = whoami() print(f"User: {user_info['name']}") print(f"Type: {user_info.get('type', 'user')}") ``` ### Error: Token not found in environment **Symptoms:** ``` KeyError: 'HF_TOKEN' ValueError: HF_TOKEN not found ``` **Causes:** 1. `secrets` not passed in job config 2. Wrong key name (should be `HF_TOKEN`) 3. Using `env` instead of `secrets` **Solutions:** 1. Use `secrets={"HF_TOKEN": "$HF_TOKEN"}` (not `env`) 2. Verify key name is exactly `HF_TOKEN` 3. Check job config syntax **Correct configuration:** ```python # ✅ Correct hf_jobs("uv", { "script": "...", "secrets": {"HF_TOKEN": "$HF_TOKEN"} }) # ❌ Wrong - using env instead of secrets hf_jobs("uv", { "script": "...", "env": {"HF_TOKEN": "$HF_TOKEN"} # Less secure }) # ❌ Wrong - wrong key name hf_jobs("uv", { "script": "...", "secrets": {"TOKEN": "$HF_TOKEN"} # Wrong key }) ``` ### Error: Repository access denied **Symptoms:** ``` 403 Client Error: Forbidden Repository not found or access denied ``` **Causes:** 1. Token doesn't have access to private repo 2. Repository doesn't exist and can't be created 3. Wrong namespace **Solutions:** 1. Use token from account with access 2. Verify repo visibility (public vs private) 3. Check namespace matches token owner 4. Create repo first if needed **Check repository access:** ```python from huggingface_hub import HfApi api = HfApi() try: repo_info = api.repo_info("username/repo-name") print(f"✅ Access granted: {repo_info.id}") except Exception as e: print(f"❌ Access denied: {e}") ``` ## Token Security Best Practices ### 1. Never Commit Tokens **❌ Bad:** ```python # Never do this! token = "hf_abc123xyz..." api = HfApi(token=token) ``` **✅ Good:** ```python # Use environment variable token = os.environ.get("HF_TOKEN") api = HfApi(token=token) ``` ### 2. Use Secrets, Not Environment Variables **❌ Bad:** ```python hf_jobs("uv", { "script": "...", "env": {"HF_TOKEN": "$HF_TOKEN"} # Visible in logs }) ``` **✅ Good:** ```python hf_jobs("uv", { "script": "...", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # Encrypted server-side }) ``` ### 3. Use Automatic Token Replacement **❌ Bad:** ```python hf_jobs("uv", { "script": "...", "secrets": {"HF_TOKEN": "hf_abc123..."} # Hardcoded }) ``` **✅ Good:** ```python hf_jobs("uv", { "script": "...", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # Automatic }) ``` ### 4. Rotate Tokens Regularly - Generate new tokens periodically - Revoke old tokens - Update job configurations - Monitor token usage ### 5. Use Minimal Permissions - Create tokens with only needed permissions - Use read tokens when write isn't needed - Don't use admin tokens for regular jobs ### 6. Don't Share Tokens - Each user should use their own token - Don't commit tokens to repositories - Don't share tokens in logs or messages ### 7. Monitor Token Usage - Check token activity in Hub settings - Review job logs for token issues - Set up alerts for unauthorized access ## Token Workflow Examples ### Example 1: Push Model to Hub ```python hf_jobs("uv", { "script": """ # /// script # dependencies = ["transformers"] # /// import os from transformers import AutoModel, AutoTokenizer # Verify token assert "HF_TOKEN" in os.environ, "HF_TOKEN required!" # Load and process model model = AutoModel.from_pretrained("base-model") # ... process model ... # Push to Hub (token auto-detected) model.push_to_hub("username/my-model") print("✅ Model pushed!") """, "flavor": "a10g-large", "timeout": "2h", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided }) ``` ### Example 2: Access Private Dataset ```python hf_jobs("uv", { "script": """ # /// script # dependencies = ["datasets"] # /// import os from datasets import load_dataset # Verify token assert "HF_TOKEN" in os.environ, "HF_TOKEN required!" # Load private dataset (token auto-detected) dataset = load_dataset("private-org/private-dataset") print(f"✅ Loaded {len(dataset)} examples") """, "flavor": "cpu-basic", "timeout": "30m", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided }) ``` ### Example 3: Create and Push Dataset ```python hf_jobs("uv", { "script": """ # /// script # dependencies = ["datasets", "huggingface-hub"] # /// import os from datasets import Dataset from huggingface_hub import HfApi # Verify token assert "HF_TOKEN" in os.environ, "HF_TOKEN required!" # Create dataset data = {"text": ["Sample 1", "Sample 2"]} dataset = Dataset.from_dict(data) # Push to Hub api = HfApi() # Auto-detects HF_TOKEN dataset.push_to_hub("username/my-dataset") print("✅ Dataset pushed!") """, "flavor": "cpu-basic", "timeout": "30m", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided }) ``` ## Quick Reference ### Token Checklist Before submitting a job that uses Hub: - [ ] Job includes `secrets={"HF_TOKEN": "$HF_TOKEN"}` - [ ] Script checks for token: `assert "HF_TOKEN" in os.environ` - [ ] Token has required permissions (read/write) - [ ] User is logged in: `hf_whoami()` works - [ ] Token not hardcoded in script - [ ] Using `secrets` not `env` for token ### Common Patterns **Pattern 1: Auto-detect token** ```python from huggingface_hub import HfApi api = HfApi() # Uses HF_TOKEN from environment ``` **Pattern 2: Explicit token** ```python import os from huggingface_hub import HfApi api = HfApi(token=os.environ.get("HF_TOKEN")) ``` **Pattern 3: Verify token** ```python import os assert "HF_TOKEN" in os.environ, "HF_TOKEN required!" ``` ## Key Takeaways 1. **Always use `secrets={"HF_TOKEN": "$HF_TOKEN"}`** for Hub operations 2. **Never hardcode tokens** in scripts or job configs 3. **Verify token exists** in script before Hub operations 4. **Use auto-detection** when possible (`HfApi()` without token parameter) 5. **Check permissions** - ensure token has required access 6. **Monitor token usage** - review activity regularly 7. **Rotate tokens** - generate new tokens periodically