AI Infrastructure TCO Calculator

By Julien Simon | AI Operating Partner, Fortino Capital | March 22, 2026 Pricing

Compare API costs, self-hosted GPU, and local/edge deployment for AI inference workloads.
Fill in your parameters below, then explore the analysis tabs.

Usage Parameters

Define your workload volume and operating schedule

API Pricing (per 1M tokens)

Select models from dropdowns for Providers 1-3. Provider 4 is fully custom.

Self-Hosted GPU Parameters

Cloud GPU rental with your own inference stack

Local / Edge Parameters

On-premises or edge deployment with consumer hardware

LLM API Cost Analysis

Costs for 4 API providers based on your usage inputs

Self-Hosted GPU Cost Analysis

Cloud GPU rental with managed inference infrastructure

Local / Edge Deployment Cost Analysis

On-premises deployment with consumer or edge hardware

Side-by-Side Comparison

All costs annualized. API (Best Single) = cheapest among your 4 selected providers.

Model Library — March 22, 2026 Pricing

Sources: openai.com/api/pricing, docs.anthropic.com, ai.google.dev, openrouter.ai

Model Library
Model Name	Provider	Input $/M tok	Output $/M tok	Notes
Gemini 3.1 Flash Lite	Anthropic	N/A (self-hosted)	N/A (self-hosted)	Open-weights 397B MoE (17B active), vision-language. Via OpenRouter.

Model Name	Provider	Input $/M tok	Output $/M tok	Notes
GPT-5.4 Pro	OpenAI	$30	$180	Top-tier reasoning model, Mar 2026
GPT-5.4	OpenAI	$2.5	$15	Flagship, Feb 2026
GPT-5.4 Mini	OpenAI	$0.75	$4.5	Mid-tier 5.4 variant, Mar 2026
GPT-5.4 Nano	OpenAI	$0.2	$1.25	Efficient 5.4 variant, Mar 2026
GPT-5.3	OpenAI	$1.75	$14	Chat-optimized, Mar 2026
GPT-5.2 Pro	OpenAI	$10.5	$84	Reasoning variant, 400K context (price halved Mar 2026)
GPT-5.2	OpenAI	$0.875	$7	Dec 2025 flagship (price halved Mar 2026)
GPT-5.1	OpenAI	$0.625	$5	Coding-optimized (price halved Mar 2026)
GPT-5	OpenAI	$1.25	$10	Aug 2025 flagship, 400K context
GPT-5 Mini	OpenAI	$0.25	$2	Efficient mid-tier, great value
GPT-5 Nano	OpenAI	$0.05	$0.4	Cheapest OpenAI option
GPT-4.1	OpenAI	$2	$8	Strong all-rounder, 1M context
GPT-4.1 Mini	OpenAI	$0.4	$1.6	Efficient mid-tier, 1M context
GPT-4.1 Nano	OpenAI	$0.1	$0.4	Fastest & cheapest GPT-4.1
o3	OpenAI	$2	$8	Reasoning model, price dropped Mar 2026
o3-mini	OpenAI	$1.1	$4.4	Affordable reasoning model
o4-mini	OpenAI	$1.1	$4.4	Affordable reasoning model
o1	OpenAI	$15	$60	Legacy reasoning, high-cost
Claude Opus 4.6	Anthropic	$5	$25	Most capable Anthropic model
Claude Sonnet 4.6	Anthropic	$3	$15	Opus-level performance at Sonnet pricing, 1M context
Claude Haiku 4.5	Anthropic	$1	$5	Fast & efficient, great for routing
Claude Opus 4.5	Anthropic	$5	$25	Previous flagship, same pricing as 4.6
Claude Sonnet 4.5	Anthropic	$3	$15	Previous Sonnet, same pricing as 4.6
Claude Sonnet 4	Anthropic	$3	$15	Previous generation Sonnet
Claude Haiku 3	Anthropic	$0.25	$1.25	Retiring Apr 2026
Gemini 3.1 Pro	Google	$2	$12	Latest Google flagship, Mar 2026
Gemini 3 Flash	Google	$0.5	$3	Pro-grade reasoning at Flash speed
Gemini 3.1 Flash Lite	Google	$0.25	$1.5	Cost-efficient 3.1 variant, Mar 2026
Gemini 2.5 Pro	Google	$1.25	$10	Production-ready, 1M context
Gemini 2.5 Flash	Google	$0.3	$2.5	Capable budget option (repriced Mar 2026)
Gemini 2.5 Flash-Lite	Google	$0.1	$0.4	Cost-efficient, now GA
Gemini 2.0 Flash-Lite	Google	$0.075	$0.3	Cheapest Google model, retiring Jun 2026
Grok 4.20	xAI	$2	$6	New flagship, 2M context, Mar 2026
Grok 4.1 Fast	xAI	$0.2	$0.5	2M context, very competitive pricing
Mistral Large 3	Mistral	$0.5	$1.5	675B params, via Mistral API
Mistral Medium 3	Mistral	$0.4	$2	Enterprise-grade, 131K context. Via OpenRouter.
Mistral Small 4	Mistral	$0.15	$0.6	Hybrid reasoning, multimodal, 262K context. Via OpenRouter.
DeepSeek V3.2	DeepSeek	$0.28	$0.42	Cost-effective API, strong coding/math
Qwen3 Max	Alibaba	$0.78	$3.9	Qwen flagship, 262K context. Via OpenRouter.
Qwen3.5 Plus	Alibaba	$0.26	$1.56	Qwen mid-tier. Via OpenRouter.
Qwen3.5 397B A17B	Alibaba	$0.39	$2.34	Open-weights 397B MoE (17B active), vision-language. Via OpenRouter.
Qwen3 235B A22B	Alibaba	$0.455	$1.82	Open-weights 235B MoE (22B active), Instruct. Via OpenRouter.
Kimi K2.5	Moonshot	$0.42	$2.2	Strong coding & math, 262K context. Via OpenRouter.
MiniMax M2.7	MiniMax	$0.3	$1.2	Latest MiniMax flagship. Via OpenRouter.
MiniMax M2.5	MiniMax	$0.2	$1.17	Previous MiniMax flagship. Via OpenRouter.
MiniMax M2-Her	MiniMax	$0.3	$1.2	65K context. Via OpenRouter.
Llama 4 Maverick	Meta	$0.15	$0.6	Open-weights 400B MoE (17B active). Via OpenRouter.
Llama 4 Scout	Meta	$0.08	$0.3	Open-weights, efficient Llama 4 variant. Via OpenRouter.
Arcee Trinity Nano	Arcee AI	N/A (self-hosted)	N/A (self-hosted)	Open-weights 6B MoE (1B active), 128K context. Self-hosted only.

GPU Instance Library — March 22, 2026 Pricing

Per-GPU on-demand hourly rates across major cloud providers

Regions: AWS us-east-1, GCP us-central1, Azure East US, CoreWeave US-East, Crusoe us-north1, FluidStack US, Lambda US, RunPod US, Together US, Vast.ai US (marketplace)

Instance	Provider	GPU	$/hr	VRAM (GB)	Notes
AWS - A100 40GB - p4d.24xlarge	AWS	A100	2.75	40	8-GPU node, ~$21.96/hr total
AWS - B200 - p6-b200.48xlarge	AWS	B200	14.24	192	8-GPU node, ~$113.93/hr total
AWS - H100 SXM - p5.48xlarge	AWS	H100	6.88	80	8-GPU node, ~$55.04/hr total
AWS - H200 - p5en.48xlarge	AWS	H200	7.91	141	8-GPU node, ~$63.30/hr total
AWS - L4 - g6.xlarge	AWS	L4	0.81	24	1-GPU instance
AWS - L4 - g6.12xlarge	AWS	L4	1.15	24	4-GPU node, ~$4.60/hr total
AWS - L40S - g6e.xlarge	AWS	L40S	1.86	48	1-GPU instance
AWS - L40S - g6e.12xlarge	AWS	L40S	2.62	48	4-GPU node, ~$10.49/hr total
Azure - A100 80GB - NC A100 v4	Azure	A100	3.67	80	1-GPU instance
Azure - H100 - ND H100 v5	Azure	H100	12.29	80	8-GPU node, ~$98.32/hr total
Azure - H200 - ND H200 v5	Azure	H200	13.78	141	8-GPU node, ~$110.24/hr total
Azure - MI300X - ND MI300X v5	Azure	MI300X	7.86	192	8-GPU node, ~$62.85/hr total
CoreWeave - A100 80GB	CoreWeave	A100	2.7	80	8-GPU node, ~$21.60/hr total
CoreWeave - B200	CoreWeave	B200	8.6	192	8-GPU node, ~$68.80/hr total
CoreWeave - GB200 NVL72	CoreWeave	GB200	10.5	186	4-GPU node, ~$42.00/hr total
CoreWeave - H100 SXM	CoreWeave	H100	6.16	80	8-GPU node, ~$49.24/hr total
CoreWeave - H200	CoreWeave	H200	6.31	141	8-GPU node, ~$50.44/hr total
CoreWeave - L40S	CoreWeave	L40S	2.25	48	8-GPU node, ~$18.00/hr total
Crusoe - A100 PCIe 40GB	Crusoe	A100	1.45	40	On-demand
Crusoe - A100 SXM 80GB	Crusoe	A100	1.95	80	On-demand
Crusoe - A100 PCIe 80GB	Crusoe	A100	1.65	80	On-demand
Crusoe - H100 HGX	Crusoe	H100	3.9	80	On-demand
Crusoe - H200 HGX	Crusoe	H200	4.29	141	On-demand
Crusoe - L40S	Crusoe	L40S	1	48	On-demand
Crusoe - MI300X	Crusoe	MI300X	3.45	192	On-demand
FluidStack - A100 SXM 80GB	FluidStack	A100	1.3	80	On-demand
FluidStack - H100 SXM	FluidStack	H100	2.1	80	On-demand
FluidStack - H200	FluidStack	H200	2.3	141	On-demand
FluidStack - L40S	FluidStack	L40S	1.3	48	On-demand
GCP - A100 40GB - a2-highgpu-1g	GCP	A100	3.67	40	1-GPU instance
GCP - H100 SXM - a3-highgpu-8g	GCP	H100	10.98	80	8-GPU node, ~$87.83/hr total
GCP - H200 - a3-ultragpu-8g	GCP	H200	10.85	141	8-GPU node, ~$86.76/hr total
GCP - L4 - g2-standard-4	GCP	L4	0.71	24	1-GPU instance
GCP - L4 - g2-standard-48	GCP	L4	1	24	4-GPU node, ~$4.00/hr total
Lambda - A100 SXM 40GB	Lambda	A100	1.48	40	1-GPU instance
Lambda - A100 SXM 80GB	Lambda	A100	2.06	80	8-GPU node pricing
Lambda - B200 SXM	Lambda	B200	5.74	192	8-GPU node pricing
Lambda - H100 PCIe	Lambda	H100	2.86	80	1-GPU instance
Lambda - H100 SXM	Lambda	H100	3.44	80	8-GPU node pricing
RunPod - A100 PCIe 80GB	RunPod	A100	1.19	80	Secure Cloud on-demand
RunPod - A100 SXM 80GB	RunPod	A100	1.39	80	Secure Cloud on-demand
RunPod - B200	RunPod	B200	5.98	192	Secure Cloud on-demand
RunPod - H100 PCIe	RunPod	H100	1.99	80	Secure Cloud on-demand
RunPod - H100 SXM	RunPod	H100	2.69	80	Secure Cloud on-demand
RunPod - H200 SXM	RunPod	H200	3.59	141	Secure Cloud on-demand
RunPod - L4	RunPod	L4	0.44	24	Secure Cloud on-demand
RunPod - L40S	RunPod	L40S	0.79	48	Secure Cloud on-demand
Together - B200	Together	B200	7.49	192	GPU cluster on-demand
Together - H100	Together	H100	3.49	80	GPU cluster on-demand
Together - H200	Together	H200	4.19	141	GPU cluster on-demand
Vast.ai - A100 SXM 80GB	Vast.ai	A100	0.77	80	Marketplace median
Vast.ai - B200	Vast.ai	B200	2.67	192	Marketplace pricing
Vast.ai - H100 SXM	Vast.ai	H100	1.5	80	Marketplace median
Vast.ai - H200	Vast.ai	H200	2.2	141	Marketplace median
Vast.ai - L4	Vast.ai	L4	0.34	24	Marketplace median
Vast.ai - L40S	Vast.ai	L40S	0.5	48	Marketplace median